Which GPUs to get for Mathematical Optimization (if any)?Java source code for branch and priceHeuristics for mixed integer linear and nonlinear programs“Best practices” for formulating MIPsFormulating a MINLP for CPLEX in PYOMOMatlab fmincon for a problem with many nonlinear constraintsCP Framework/Solver for .Net CoreIs using gradient descent for MIP a good idea?

Could the Ancient Egyptian hieroglyphs have been deciphered without the Rosetta Stone with modern tech?

What is the role of Shutter Speed while using a Manual Flash?

Do Macs come with any programming language available from the Terminal?

Translation Golf XLIX - An Accurate Shot

Is it unsafe to remove one stud from a load bearing wall?

Printing the bits of an integer using bitfields and union

An historical mystery : Poincaré’s silence on Lebesgue integral and measure theory?

Das ist ja wohl nicht dein Ernst - meaning of particle "ja"

SD Card speed degrading and doesn't work on one of my cameras: can I do something?

Need Good OOP Design For World and Countries Problem

Is a vector space naturally isomorphic to its dual?

Fantasy movie with magical cliffs that crush a ship

Does a motor suffer damage when resistance to the intended motion is met?

Is it sportsmanlike to waste opponents' time by giving check at the end of the game?

What are these criss-cross patterns close to Cambridge Airport (UK)?

MS in Mathematics, having trouble finding work outside teaching algebra

Speaking German abroad and feeling condescended to

Why did Leia not want to tell Han about Luke being her twin brother?

Is it possible to keep cat litter on balcony during winter (down to -10°C)

Google just EOLed the original Pixel. How long until it's a brick?

How to fill a closed parametric curve?

Explanatory vs Non-explanatory Proofs

Why do airports in the UK have so few runways?

Effects of quantum computing on parallel universes



Which GPUs to get for Mathematical Optimization (if any)?


Java source code for branch and priceHeuristics for mixed integer linear and nonlinear programs“Best practices” for formulating MIPsFormulating a MINLP for CPLEX in PYOMOMatlab fmincon for a problem with many nonlinear constraintsCP Framework/Solver for .Net CoreIs using gradient descent for MIP a good idea?













25














$begingroup$


The Machine Learning community has largely benefited from modern GPUs and several large companies are investing in new dedicated hardware.



Unfortunately, academic and commercial mathematical optimization solvers still lack support for GPUs. They have support for distributed or shared memory computing environment (e.g., see the Ubiquity Generator framework from ZIB), but it looks that GPUs raise different technical challenges for (discrete) math optimizers.



Here my two questions:



  1. Which GPU, if any, should I get for mathematical optimization?

  2. Does there exist any mathematical optimization software that can fully exploit multiple modern GPUs?









share|improve this question












$endgroup$



















    25














    $begingroup$


    The Machine Learning community has largely benefited from modern GPUs and several large companies are investing in new dedicated hardware.



    Unfortunately, academic and commercial mathematical optimization solvers still lack support for GPUs. They have support for distributed or shared memory computing environment (e.g., see the Ubiquity Generator framework from ZIB), but it looks that GPUs raise different technical challenges for (discrete) math optimizers.



    Here my two questions:



    1. Which GPU, if any, should I get for mathematical optimization?

    2. Does there exist any mathematical optimization software that can fully exploit multiple modern GPUs?









    share|improve this question












    $endgroup$

















      25












      25








      25


      2



      $begingroup$


      The Machine Learning community has largely benefited from modern GPUs and several large companies are investing in new dedicated hardware.



      Unfortunately, academic and commercial mathematical optimization solvers still lack support for GPUs. They have support for distributed or shared memory computing environment (e.g., see the Ubiquity Generator framework from ZIB), but it looks that GPUs raise different technical challenges for (discrete) math optimizers.



      Here my two questions:



      1. Which GPU, if any, should I get for mathematical optimization?

      2. Does there exist any mathematical optimization software that can fully exploit multiple modern GPUs?









      share|improve this question












      $endgroup$




      The Machine Learning community has largely benefited from modern GPUs and several large companies are investing in new dedicated hardware.



      Unfortunately, academic and commercial mathematical optimization solvers still lack support for GPUs. They have support for distributed or shared memory computing environment (e.g., see the Ubiquity Generator framework from ZIB), but it looks that GPUs raise different technical challenges for (discrete) math optimizers.



      Here my two questions:



      1. Which GPU, if any, should I get for mathematical optimization?

      2. Does there exist any mathematical optimization software that can fully exploit multiple modern GPUs?






      solver parallel-computing gpu accelerated-hardware






      share|improve this question
















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jul 18 at 5:05









      Rodrigo de Azevedo

      1617 bronze badges




      1617 bronze badges










      asked Jul 17 at 8:13









      Stefano GualandiStefano Gualandi

      8876 silver badges20 bronze badges




      8876 silver badges20 bronze badges























          5 Answers
          5






          active

          oldest

          votes


















          14
















          $begingroup$

          I've not seen any efficient use from GPU's for metaheuristics - only experiments that proved their inefficiency for these algorithms. So not the right tool for the job, apparently. Maybe there's a undiscovered technique to make them work efficiently. (I have seen/build efficient use of multiple CPU cores for metaheuristics, even on Local Search with incremental fitness calculation.)



          Let me define efficient use as: in a apples to apples comparison on multiple non-trivial datasets, running for the same amount of time (ideally a few minutes), the GPU strategy isn't dominated by the non-GPU strategy.






          share|improve this answer










          $endgroup$






















            13
















            $begingroup$

            If you problem is continuous I would say that it might be beneficial. For problems that involve discrete variables I've not seen anything that does benefit from the usage of a GPU.



            GPUs aid problem solving if the underlying problem has a structure that can exploit the massive parallel computation structure of the graphical processing unit. Calculations which involve large matrices are a good example. Deep learning is able to make very good use of GPUs as their computations can (roughly) be written as matrix vector operations. There exist specialized BLAS and LAPACK instructions for GPUs (e.g. https://developer.nvidia.com/cublas), so algorithms making use of these are likely to see a speedup.

            On the other hand algorithms which exhibit inherent "branches" (i.e. decisions) within their flow cannot use the parallel computation capabilities of GPUs because with each branch the problem changes a bit.



            So to answer your question(s):



            1. If your problem involves discrete decisions: You won't get any benefit from having a potent GPU.

              If your problem involves only continuous variables: It might be beneficial - but I've not seen any solver that claims to specifically exploit GPUs.

            2. I'm not aware of any.





            share|improve this answer












            $endgroup$






















              11
















              $begingroup$

              The first algorithm coming to mind that can benefit from GPUs is the Interior-Point Method (IPM), at its heart is the resolution of a linear system. See references:




              1. GPU Acceleration of the Matrix-Free Interior Point Method

              2. Cholesky Decomposition and Linear Programming
                on a GPU





              share|improve this answer












              $endgroup$






















                9
















                $begingroup$

                A lot depends on what kinds of computations you are doing. The subject of this group is "Operations Research", but that surely includes a range of computational work including discrete event simulation, machine learning, linear and nonlinear programming, discrete optimization, etc. There's no one answer applicable to all of those kinds of problems.



                For linear and nonlinear programming, one important issue is that nearly all of these computations are typically performed in double precision rather than single precision.



                NVIDIA has adopted a strategy in which different models of their GPU's are optimized for different uses and priced differently. Except for the Tesla line of GPU's aimed at high performance computing applications, most of NVIDIA's GPU's are configured so that double precision is much (e.g. 32 times) slower than single precision. This means that these other models are poorly suited for double precision floating point computations.



                In contrast, most machine learning computing can be done in single precision (or even half precision.) The inexpensive consumer oriented GPU's sold by NVIDIA perform incredibly well on these kinds of computations.



                Another important issue in linear algebra computations is whether the matrices that you're working with are sparse (have lots of zero entries) or dense (all or nearly all entries are nonzero.) GPU's excel at dense matrix linear algebra but don't perform quite so well with sparse matrices. Nearly all linear programming models have sparse constraint matrices and this is exploited by both simplex and interior point solvers. Thus GPU's have not been very successful in linear programming (you'll notice that neither CPLEX nor GuRoBi work with GPU's.) The situation with nonlinear programming is somewhat more varied.






                share|improve this answer












                $endgroup$










                • 1




                  $begingroup$
                  Thanks Brian! You're right, even in my limited experience with GPU, dealing with single or double precision makes a huge difference with GPU speed.
                  $endgroup$
                  – Stefano Gualandi
                  Jul 18 at 16:13






                • 2




                  $begingroup$
                  Very good point mentioning the sparsity of the problem.
                  $endgroup$
                  – JakobS
                  Jul 18 at 21:39


















                6
















                $begingroup$


                1. Which GPU, if any, should I get for mathematical optimization?



                In the case of commercially available software, where no source code is available, you are stuck using the GPU that is better supported by the applications you intend to run.



                • AmgX, cuSOLVER and nvGRAPH all require Nvidia GPUs, and offer supporting articles on their blog.


                • Cusp is a library for sparse linear algebra and graph computations based on Thrust. Cusp provides a flexible, high-level interface for manipulating sparse matrices and solving sparse linear systems. It is written to use CUDA.



                • Hyperlearn requires CUDA. Offers GPU acceleration of:



                  • Matrix Completion algorithms - Non Negative Least Squares, NNMF

                  • Batch Similarity Latent Dirichelt Allocation (BS-LDA)

                  • Correlation Regression

                  • Feasible Generalized Least Squares FGLS

                  • Outlier Tolerant Regression

                  • Multidimensional Spline Regression

                  • Generalized MICE (any model drop in replacement)

                  • Using Uber's Pyro for Bayesian Deep Learning


                • Matlab only supports GPU acceleration on Nvidia GPUs when using the Parallel Computing Toolbox, otherwise any graphics card supporting OpenGL 3.3 with 1GB GPU memory is recommended.


                • Pagmo2 supports both Nvidia and AMD GPU acceleration. Pagmo (C++) or pygmo (Python) is a scientific library for massively parallel optimization. It is built around the idea of providing a unified interface to optimization algorithms and to optimization problems and to make their deployment in massively parallel environments easy. A short list of some papers from the European Space Agency where pagmo was utilized.


                • Python has a number of libraries that support CUDA, but not as much support for AMD GPUs and OpenCL, some libraries such as Numba support both GPU manufacturers but Nvidia certainly blogs about it more.


                • scikit-CUDA provides Python interfaces to many of the functions in the CUDA device/runtime, cuBLAS, cuFFT, and cuSOLVER libraries distributed as part of NVIDIA’s CUDA Programming Toolkit, as well as interfaces to select functions in the CULA Dense Toolkit. Both low-level wrapper functions similar to their C counterparts and high-level functions comparable to those in NumPy and Scipy are provided.


                • SuiteSparse libraries for sparse matrix operations on Nvidia GPUs.


                • Theano combines aspects of a computer algebra system (CAS) with aspects of an optimizing compiler. It can also generate customized C code for many mathematical operations. This combination of CAS with optimizing compilation is particularly useful for tasks in which complicated mathematical expressions are evaluated repeatedly and evaluation speed is critical. For situations where many different expressions are each evaluated once Theano can minimize the amount of compilation/analysis overhead, but still provide symbolic features such as automatic differentiation.It requires CUDA.


                • ViennaCL provides CUDA, OpenCL and OpenMP computing backends. It enables simple, high-level access to the vast computing resources available on parallel architectures such as GPUs and is primarily focused on common sparse and dense linear algebra operations (BLAS levels 1, 2 and 3). It also provides iterative solvers with optional preconditioners for large systems of equations.


                A good website for benchmarks for FP64, FP32 and FP16 is Lambda Labs, one article in particular ("Deep Learning GPU Benchmarks - Tesla V100 vs RTX 2080 Ti vs GTX 1080 Ti vs Titan V") offers a great bottom line on what you get, and how much it costs. Don't let the DL slant discourage you, DL can be used for optimization (fast results, not guaranteed to be absolutely optimal) either as a starting point for your variables or a final result. My purpose of mentioning this article is to cite these quotes:




                "Results summary



                As of February 8, 2019, the NVIDIA RTX 2080 Ti is the best GPU for deep learning research on a single GPU system running TensorFlow. A typical single GPU system with this GPU will be:



                • 37% faster than the 1080 Ti with FP32, 62% faster with FP16, and 25% more expensive.

                • 35% faster than the 2080 with FP32, 47% faster with FP16, and 25% more expensive.

                • 96% as fast as the Titan V with FP32, 3% faster with FP16, and ~1/2 of the cost.

                • 80% as fast as the Tesla V100 with FP32, 82% as fast with FP16, and ~1/5 of the cost.

                Note that all experiments utilized Tensor Cores when available and are priced out on a complete single GPU system cost. As a system builder and AI research company, we're trying to make benchmarks that are scientific, reproducible, correlate with real world training scenarios, and have accurate prices. So, we've decided to make the spreadsheet that generated our graphs and (performance / $) tables public."



                ...



                "2080 Ti vs V100 - is the 2080 Ti really that fast?



                How can the 2080 Ti be 80% as fast as the Tesla V100, but only 1/8th of the price? The answer is simple: NVIDIA wants to segment the market so that those with high willingness to pay (hyper scalers) only buy their TESLA line of cards which retail for ~$9,800. The RTX and GTX series of cards still offers the best performance per dollar.



                If you're not AWS, Azure, or Google Cloud then you're probably much better off buying the 2080 Ti. There are, however, a few key use cases where the V100s can come in handy:



                • If you need FP64 compute. If you're doing Computational Fluid Dynamics, n-body simulation, or other work that requires high numerical precision (FP64), then you'll need to buy the Titan V or V100s. If you're not sure if you need FP64, you don't. You would know.

                • If you absolutely need 32 GB of memory because your model size won't fit into 11 GB of memory with a batch size of 1. If you are creating your own model architecture and it simply can't fit even when you bring the batch size lower, the V100 could make sense. However, this is a pretty rare edge case. Fewer than 5% of our customers are using custom models. Most use something like ResNet, VGG, Inception, SSD, or Yolo.

                So. You're still wondering. Why would anybody buy the V100? It comes down to marketing.



                2080 Ti is a Porsche 911, the V100 is a Bugatti Veyron



                The V100 is a bit like a Bugatti Veyron. It's one of the fastest street legal cars in the world, ridiculously expensive, and, if you have to ask how much the insurance and maintenance is, you can't afford it. The RTX 2080 Ti, on the other hand, is like a Porsche 911. It's very fast, handles well, expensive but not ostentatious, and with the same amount of money you'd pay for the Bugatti, you can buy the Porsche, a home, a BMW 7-series, send three kids to college, and have money left over for retirement. [Rob's note: costs are different for him compared to my calculations.]



                And if you think I'm going overboard with the Porsche analogy, you can buy a DGX-1 8x V100 for $120,000 or a Lambda Blade 8x 2080 Ti for $28,000 and have enough left over for a real Porsche 911. Your pick.".




                Thus, you want to pick a GPU manufacturer that provides better benchmarks for the programs you want to run, unless you have the source code and possess some GPU tweaking skills.



                The best deal is probably the AMD Radeon VII with it's FP64 rate of 1/4 for only U$700, and even though it's new it's also being discontinued; so there may be some price drops coming. Unfortunately while the hardware is probably a better deal for many people the amount of software available that can wrestle the performance out of it is far fewer and not as developed as what's available for an Nvidia card.




                1. Does there exist any mathematical optimization software that can fully exploit multiple modern GPUs?



                All of the above links list software that benefits from more GPU cores, even if they are spread across multiple cards, multiple machines or even cloud GPU computing in some cases.



                An often quoted article about selecting a GPU and using multiple GPUs is: "Which GPU(s) to Get for Deep Learning: My Experience and Advice for Using GPUs in Deep Learning" (2019-04-03) by Tim Dettmers. While it's focus is on Deep Learning it provides an excellent explanation of the difficulties and performance increase to be expected when using multiple GPUs.



                He also says something about the usage of GPUs, in general (where applicable), but again it's in reference to DL (though still applicable to OR optimization):




                "Overall I think I still cannot give a clear recommendation for AMD GPUs for ordinary users that just want their GPUs to work smoothly. More experienced users should have fewer problems and by supporting AMD GPUs and ROCm/HIP developers they contribute to the combat against the monopoly position of NVIDIA as this will greatly benefit everyone in the long-term. If you are a GPU developer and want to make important contributions to GPU computing, then an AMD GPU might be the best way to make a good impact over the long-term. For everyone else, NVIDIA GPUs might be the safer choice.".




                Articles about using GPUs for Operations Research:



                • GPU Computing Applied to Linear and Mixed Integer Programming


                • GPU computing in discrete optimization. Part II: Survey focused on routing problems


                • gpuMF: a framework for parallel hybrid metaheuristics on GPU with application to the minimisation of harmonics in multilevel inverters


                I'll return later to expand this answer.






                share|improve this answer












                $endgroup$










                • 1




                  $begingroup$
                  Thanks. The post by Tim Dettmers was ispiring my question.
                  $endgroup$
                  – Stefano Gualandi
                  Jul 22 at 5:36










                • $begingroup$
                  @StefanoGualandi - You are most welcome. I was just on my way back with an addition when I was diverted, but you can check back tomorrow. BTW: How many bits are you planning on using most often (FP64, bfloat, INT2-8, etc.), how much memory (can you afford, over paying for performance). And, what is your overall budget - do you want 4x$500 cards or 2x$3000 cards, for example. Do you want bang for your $, or simply very fast but not most expensive? Please add to your question any additional info you wish to offer and I'll try to address a specific situation.
                  $endgroup$
                  – Rob
                  Jul 22 at 5:45











                • $begingroup$
                  the last two papers you linked are interesting, but I need to find out the time to read them carefully.
                  $endgroup$
                  – Stefano Gualandi
                  Jul 22 at 8:59










                • $begingroup$
                  indeed the last two papers look interesting. I browsed through the first one which is a meta analysis of several other papers that use GPUs for OR problems. The paper is somewhat old (from 2016 with many cited papers from early 201Xs) but even at that time the results at least for exact approaches and simplex look underwhelming: "The authors use randomly generated instances of ATSP with up to 16 cities", "...with 8,000 variables and 2,700 constraints". Often it is not even clear what they compare against.
                  $endgroup$
                  – JakobS
                  Jul 22 at 10:49












                Your Answer








                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "700"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: false,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: null,
                bindNavPrevention: true,
                postfix: "",
                imageUploader:
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                ,
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );














                draft saved

                draft discarded
















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2for.stackexchange.com%2fquestions%2f1024%2fwhich-gpus-to-get-for-mathematical-optimization-if-any%23new-answer', 'question_page');

                );

                Post as a guest















                Required, but never shown


























                5 Answers
                5






                active

                oldest

                votes








                5 Answers
                5






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                14
















                $begingroup$

                I've not seen any efficient use from GPU's for metaheuristics - only experiments that proved their inefficiency for these algorithms. So not the right tool for the job, apparently. Maybe there's a undiscovered technique to make them work efficiently. (I have seen/build efficient use of multiple CPU cores for metaheuristics, even on Local Search with incremental fitness calculation.)



                Let me define efficient use as: in a apples to apples comparison on multiple non-trivial datasets, running for the same amount of time (ideally a few minutes), the GPU strategy isn't dominated by the non-GPU strategy.






                share|improve this answer










                $endgroup$



















                  14
















                  $begingroup$

                  I've not seen any efficient use from GPU's for metaheuristics - only experiments that proved their inefficiency for these algorithms. So not the right tool for the job, apparently. Maybe there's a undiscovered technique to make them work efficiently. (I have seen/build efficient use of multiple CPU cores for metaheuristics, even on Local Search with incremental fitness calculation.)



                  Let me define efficient use as: in a apples to apples comparison on multiple non-trivial datasets, running for the same amount of time (ideally a few minutes), the GPU strategy isn't dominated by the non-GPU strategy.






                  share|improve this answer










                  $endgroup$

















                    14














                    14










                    14







                    $begingroup$

                    I've not seen any efficient use from GPU's for metaheuristics - only experiments that proved their inefficiency for these algorithms. So not the right tool for the job, apparently. Maybe there's a undiscovered technique to make them work efficiently. (I have seen/build efficient use of multiple CPU cores for metaheuristics, even on Local Search with incremental fitness calculation.)



                    Let me define efficient use as: in a apples to apples comparison on multiple non-trivial datasets, running for the same amount of time (ideally a few minutes), the GPU strategy isn't dominated by the non-GPU strategy.






                    share|improve this answer










                    $endgroup$



                    I've not seen any efficient use from GPU's for metaheuristics - only experiments that proved their inefficiency for these algorithms. So not the right tool for the job, apparently. Maybe there's a undiscovered technique to make them work efficiently. (I have seen/build efficient use of multiple CPU cores for metaheuristics, even on Local Search with incremental fitness calculation.)



                    Let me define efficient use as: in a apples to apples comparison on multiple non-trivial datasets, running for the same amount of time (ideally a few minutes), the GPU strategy isn't dominated by the non-GPU strategy.







                    share|improve this answer













                    share|improve this answer




                    share|improve this answer










                    answered Jul 17 at 8:27









                    Geoffrey De SmetGeoffrey De Smet

                    1,83420 bronze badges




                    1,83420 bronze badges
























                        13
















                        $begingroup$

                        If you problem is continuous I would say that it might be beneficial. For problems that involve discrete variables I've not seen anything that does benefit from the usage of a GPU.



                        GPUs aid problem solving if the underlying problem has a structure that can exploit the massive parallel computation structure of the graphical processing unit. Calculations which involve large matrices are a good example. Deep learning is able to make very good use of GPUs as their computations can (roughly) be written as matrix vector operations. There exist specialized BLAS and LAPACK instructions for GPUs (e.g. https://developer.nvidia.com/cublas), so algorithms making use of these are likely to see a speedup.

                        On the other hand algorithms which exhibit inherent "branches" (i.e. decisions) within their flow cannot use the parallel computation capabilities of GPUs because with each branch the problem changes a bit.



                        So to answer your question(s):



                        1. If your problem involves discrete decisions: You won't get any benefit from having a potent GPU.

                          If your problem involves only continuous variables: It might be beneficial - but I've not seen any solver that claims to specifically exploit GPUs.

                        2. I'm not aware of any.





                        share|improve this answer












                        $endgroup$



















                          13
















                          $begingroup$

                          If you problem is continuous I would say that it might be beneficial. For problems that involve discrete variables I've not seen anything that does benefit from the usage of a GPU.



                          GPUs aid problem solving if the underlying problem has a structure that can exploit the massive parallel computation structure of the graphical processing unit. Calculations which involve large matrices are a good example. Deep learning is able to make very good use of GPUs as their computations can (roughly) be written as matrix vector operations. There exist specialized BLAS and LAPACK instructions for GPUs (e.g. https://developer.nvidia.com/cublas), so algorithms making use of these are likely to see a speedup.

                          On the other hand algorithms which exhibit inherent "branches" (i.e. decisions) within their flow cannot use the parallel computation capabilities of GPUs because with each branch the problem changes a bit.



                          So to answer your question(s):



                          1. If your problem involves discrete decisions: You won't get any benefit from having a potent GPU.

                            If your problem involves only continuous variables: It might be beneficial - but I've not seen any solver that claims to specifically exploit GPUs.

                          2. I'm not aware of any.





                          share|improve this answer












                          $endgroup$

















                            13














                            13










                            13







                            $begingroup$

                            If you problem is continuous I would say that it might be beneficial. For problems that involve discrete variables I've not seen anything that does benefit from the usage of a GPU.



                            GPUs aid problem solving if the underlying problem has a structure that can exploit the massive parallel computation structure of the graphical processing unit. Calculations which involve large matrices are a good example. Deep learning is able to make very good use of GPUs as their computations can (roughly) be written as matrix vector operations. There exist specialized BLAS and LAPACK instructions for GPUs (e.g. https://developer.nvidia.com/cublas), so algorithms making use of these are likely to see a speedup.

                            On the other hand algorithms which exhibit inherent "branches" (i.e. decisions) within their flow cannot use the parallel computation capabilities of GPUs because with each branch the problem changes a bit.



                            So to answer your question(s):



                            1. If your problem involves discrete decisions: You won't get any benefit from having a potent GPU.

                              If your problem involves only continuous variables: It might be beneficial - but I've not seen any solver that claims to specifically exploit GPUs.

                            2. I'm not aware of any.





                            share|improve this answer












                            $endgroup$



                            If you problem is continuous I would say that it might be beneficial. For problems that involve discrete variables I've not seen anything that does benefit from the usage of a GPU.



                            GPUs aid problem solving if the underlying problem has a structure that can exploit the massive parallel computation structure of the graphical processing unit. Calculations which involve large matrices are a good example. Deep learning is able to make very good use of GPUs as their computations can (roughly) be written as matrix vector operations. There exist specialized BLAS and LAPACK instructions for GPUs (e.g. https://developer.nvidia.com/cublas), so algorithms making use of these are likely to see a speedup.

                            On the other hand algorithms which exhibit inherent "branches" (i.e. decisions) within their flow cannot use the parallel computation capabilities of GPUs because with each branch the problem changes a bit.



                            So to answer your question(s):



                            1. If your problem involves discrete decisions: You won't get any benefit from having a potent GPU.

                              If your problem involves only continuous variables: It might be beneficial - but I've not seen any solver that claims to specifically exploit GPUs.

                            2. I'm not aware of any.






                            share|improve this answer















                            share|improve this answer




                            share|improve this answer








                            edited Jul 17 at 19:22

























                            answered Jul 17 at 14:06









                            JakobSJakobS

                            2,0806 silver badges25 bronze badges




                            2,0806 silver badges25 bronze badges
























                                11
















                                $begingroup$

                                The first algorithm coming to mind that can benefit from GPUs is the Interior-Point Method (IPM), at its heart is the resolution of a linear system. See references:




                                1. GPU Acceleration of the Matrix-Free Interior Point Method

                                2. Cholesky Decomposition and Linear Programming
                                  on a GPU





                                share|improve this answer












                                $endgroup$



















                                  11
















                                  $begingroup$

                                  The first algorithm coming to mind that can benefit from GPUs is the Interior-Point Method (IPM), at its heart is the resolution of a linear system. See references:




                                  1. GPU Acceleration of the Matrix-Free Interior Point Method

                                  2. Cholesky Decomposition and Linear Programming
                                    on a GPU





                                  share|improve this answer












                                  $endgroup$

















                                    11














                                    11










                                    11







                                    $begingroup$

                                    The first algorithm coming to mind that can benefit from GPUs is the Interior-Point Method (IPM), at its heart is the resolution of a linear system. See references:




                                    1. GPU Acceleration of the Matrix-Free Interior Point Method

                                    2. Cholesky Decomposition and Linear Programming
                                      on a GPU





                                    share|improve this answer












                                    $endgroup$



                                    The first algorithm coming to mind that can benefit from GPUs is the Interior-Point Method (IPM), at its heart is the resolution of a linear system. See references:




                                    1. GPU Acceleration of the Matrix-Free Interior Point Method

                                    2. Cholesky Decomposition and Linear Programming
                                      on a GPU






                                    share|improve this answer















                                    share|improve this answer




                                    share|improve this answer








                                    edited Jul 17 at 16:02









                                    Kevin Dalmeijer

                                    4,0721 gold badge10 silver badges40 bronze badges




                                    4,0721 gold badge10 silver badges40 bronze badges










                                    answered Jul 17 at 15:49









                                    Mathieu BMathieu B

                                    2111 silver badge5 bronze badges




                                    2111 silver badge5 bronze badges
























                                        9
















                                        $begingroup$

                                        A lot depends on what kinds of computations you are doing. The subject of this group is "Operations Research", but that surely includes a range of computational work including discrete event simulation, machine learning, linear and nonlinear programming, discrete optimization, etc. There's no one answer applicable to all of those kinds of problems.



                                        For linear and nonlinear programming, one important issue is that nearly all of these computations are typically performed in double precision rather than single precision.



                                        NVIDIA has adopted a strategy in which different models of their GPU's are optimized for different uses and priced differently. Except for the Tesla line of GPU's aimed at high performance computing applications, most of NVIDIA's GPU's are configured so that double precision is much (e.g. 32 times) slower than single precision. This means that these other models are poorly suited for double precision floating point computations.



                                        In contrast, most machine learning computing can be done in single precision (or even half precision.) The inexpensive consumer oriented GPU's sold by NVIDIA perform incredibly well on these kinds of computations.



                                        Another important issue in linear algebra computations is whether the matrices that you're working with are sparse (have lots of zero entries) or dense (all or nearly all entries are nonzero.) GPU's excel at dense matrix linear algebra but don't perform quite so well with sparse matrices. Nearly all linear programming models have sparse constraint matrices and this is exploited by both simplex and interior point solvers. Thus GPU's have not been very successful in linear programming (you'll notice that neither CPLEX nor GuRoBi work with GPU's.) The situation with nonlinear programming is somewhat more varied.






                                        share|improve this answer












                                        $endgroup$










                                        • 1




                                          $begingroup$
                                          Thanks Brian! You're right, even in my limited experience with GPU, dealing with single or double precision makes a huge difference with GPU speed.
                                          $endgroup$
                                          – Stefano Gualandi
                                          Jul 18 at 16:13






                                        • 2




                                          $begingroup$
                                          Very good point mentioning the sparsity of the problem.
                                          $endgroup$
                                          – JakobS
                                          Jul 18 at 21:39















                                        9
















                                        $begingroup$

                                        A lot depends on what kinds of computations you are doing. The subject of this group is "Operations Research", but that surely includes a range of computational work including discrete event simulation, machine learning, linear and nonlinear programming, discrete optimization, etc. There's no one answer applicable to all of those kinds of problems.



                                        For linear and nonlinear programming, one important issue is that nearly all of these computations are typically performed in double precision rather than single precision.



                                        NVIDIA has adopted a strategy in which different models of their GPU's are optimized for different uses and priced differently. Except for the Tesla line of GPU's aimed at high performance computing applications, most of NVIDIA's GPU's are configured so that double precision is much (e.g. 32 times) slower than single precision. This means that these other models are poorly suited for double precision floating point computations.



                                        In contrast, most machine learning computing can be done in single precision (or even half precision.) The inexpensive consumer oriented GPU's sold by NVIDIA perform incredibly well on these kinds of computations.



                                        Another important issue in linear algebra computations is whether the matrices that you're working with are sparse (have lots of zero entries) or dense (all or nearly all entries are nonzero.) GPU's excel at dense matrix linear algebra but don't perform quite so well with sparse matrices. Nearly all linear programming models have sparse constraint matrices and this is exploited by both simplex and interior point solvers. Thus GPU's have not been very successful in linear programming (you'll notice that neither CPLEX nor GuRoBi work with GPU's.) The situation with nonlinear programming is somewhat more varied.






                                        share|improve this answer












                                        $endgroup$










                                        • 1




                                          $begingroup$
                                          Thanks Brian! You're right, even in my limited experience with GPU, dealing with single or double precision makes a huge difference with GPU speed.
                                          $endgroup$
                                          – Stefano Gualandi
                                          Jul 18 at 16:13






                                        • 2




                                          $begingroup$
                                          Very good point mentioning the sparsity of the problem.
                                          $endgroup$
                                          – JakobS
                                          Jul 18 at 21:39













                                        9














                                        9










                                        9







                                        $begingroup$

                                        A lot depends on what kinds of computations you are doing. The subject of this group is "Operations Research", but that surely includes a range of computational work including discrete event simulation, machine learning, linear and nonlinear programming, discrete optimization, etc. There's no one answer applicable to all of those kinds of problems.



                                        For linear and nonlinear programming, one important issue is that nearly all of these computations are typically performed in double precision rather than single precision.



                                        NVIDIA has adopted a strategy in which different models of their GPU's are optimized for different uses and priced differently. Except for the Tesla line of GPU's aimed at high performance computing applications, most of NVIDIA's GPU's are configured so that double precision is much (e.g. 32 times) slower than single precision. This means that these other models are poorly suited for double precision floating point computations.



                                        In contrast, most machine learning computing can be done in single precision (or even half precision.) The inexpensive consumer oriented GPU's sold by NVIDIA perform incredibly well on these kinds of computations.



                                        Another important issue in linear algebra computations is whether the matrices that you're working with are sparse (have lots of zero entries) or dense (all or nearly all entries are nonzero.) GPU's excel at dense matrix linear algebra but don't perform quite so well with sparse matrices. Nearly all linear programming models have sparse constraint matrices and this is exploited by both simplex and interior point solvers. Thus GPU's have not been very successful in linear programming (you'll notice that neither CPLEX nor GuRoBi work with GPU's.) The situation with nonlinear programming is somewhat more varied.






                                        share|improve this answer












                                        $endgroup$



                                        A lot depends on what kinds of computations you are doing. The subject of this group is "Operations Research", but that surely includes a range of computational work including discrete event simulation, machine learning, linear and nonlinear programming, discrete optimization, etc. There's no one answer applicable to all of those kinds of problems.



                                        For linear and nonlinear programming, one important issue is that nearly all of these computations are typically performed in double precision rather than single precision.



                                        NVIDIA has adopted a strategy in which different models of their GPU's are optimized for different uses and priced differently. Except for the Tesla line of GPU's aimed at high performance computing applications, most of NVIDIA's GPU's are configured so that double precision is much (e.g. 32 times) slower than single precision. This means that these other models are poorly suited for double precision floating point computations.



                                        In contrast, most machine learning computing can be done in single precision (or even half precision.) The inexpensive consumer oriented GPU's sold by NVIDIA perform incredibly well on these kinds of computations.



                                        Another important issue in linear algebra computations is whether the matrices that you're working with are sparse (have lots of zero entries) or dense (all or nearly all entries are nonzero.) GPU's excel at dense matrix linear algebra but don't perform quite so well with sparse matrices. Nearly all linear programming models have sparse constraint matrices and this is exploited by both simplex and interior point solvers. Thus GPU's have not been very successful in linear programming (you'll notice that neither CPLEX nor GuRoBi work with GPU's.) The situation with nonlinear programming is somewhat more varied.







                                        share|improve this answer















                                        share|improve this answer




                                        share|improve this answer








                                        edited Jul 19 at 3:55

























                                        answered Jul 18 at 16:08









                                        Brian BorchersBrian Borchers

                                        3461 silver badge7 bronze badges




                                        3461 silver badge7 bronze badges










                                        • 1




                                          $begingroup$
                                          Thanks Brian! You're right, even in my limited experience with GPU, dealing with single or double precision makes a huge difference with GPU speed.
                                          $endgroup$
                                          – Stefano Gualandi
                                          Jul 18 at 16:13






                                        • 2




                                          $begingroup$
                                          Very good point mentioning the sparsity of the problem.
                                          $endgroup$
                                          – JakobS
                                          Jul 18 at 21:39












                                        • 1




                                          $begingroup$
                                          Thanks Brian! You're right, even in my limited experience with GPU, dealing with single or double precision makes a huge difference with GPU speed.
                                          $endgroup$
                                          – Stefano Gualandi
                                          Jul 18 at 16:13






                                        • 2




                                          $begingroup$
                                          Very good point mentioning the sparsity of the problem.
                                          $endgroup$
                                          – JakobS
                                          Jul 18 at 21:39







                                        1




                                        1




                                        $begingroup$
                                        Thanks Brian! You're right, even in my limited experience with GPU, dealing with single or double precision makes a huge difference with GPU speed.
                                        $endgroup$
                                        – Stefano Gualandi
                                        Jul 18 at 16:13




                                        $begingroup$
                                        Thanks Brian! You're right, even in my limited experience with GPU, dealing with single or double precision makes a huge difference with GPU speed.
                                        $endgroup$
                                        – Stefano Gualandi
                                        Jul 18 at 16:13




                                        2




                                        2




                                        $begingroup$
                                        Very good point mentioning the sparsity of the problem.
                                        $endgroup$
                                        – JakobS
                                        Jul 18 at 21:39




                                        $begingroup$
                                        Very good point mentioning the sparsity of the problem.
                                        $endgroup$
                                        – JakobS
                                        Jul 18 at 21:39











                                        6
















                                        $begingroup$


                                        1. Which GPU, if any, should I get for mathematical optimization?



                                        In the case of commercially available software, where no source code is available, you are stuck using the GPU that is better supported by the applications you intend to run.



                                        • AmgX, cuSOLVER and nvGRAPH all require Nvidia GPUs, and offer supporting articles on their blog.


                                        • Cusp is a library for sparse linear algebra and graph computations based on Thrust. Cusp provides a flexible, high-level interface for manipulating sparse matrices and solving sparse linear systems. It is written to use CUDA.



                                        • Hyperlearn requires CUDA. Offers GPU acceleration of:



                                          • Matrix Completion algorithms - Non Negative Least Squares, NNMF

                                          • Batch Similarity Latent Dirichelt Allocation (BS-LDA)

                                          • Correlation Regression

                                          • Feasible Generalized Least Squares FGLS

                                          • Outlier Tolerant Regression

                                          • Multidimensional Spline Regression

                                          • Generalized MICE (any model drop in replacement)

                                          • Using Uber's Pyro for Bayesian Deep Learning


                                        • Matlab only supports GPU acceleration on Nvidia GPUs when using the Parallel Computing Toolbox, otherwise any graphics card supporting OpenGL 3.3 with 1GB GPU memory is recommended.


                                        • Pagmo2 supports both Nvidia and AMD GPU acceleration. Pagmo (C++) or pygmo (Python) is a scientific library for massively parallel optimization. It is built around the idea of providing a unified interface to optimization algorithms and to optimization problems and to make their deployment in massively parallel environments easy. A short list of some papers from the European Space Agency where pagmo was utilized.


                                        • Python has a number of libraries that support CUDA, but not as much support for AMD GPUs and OpenCL, some libraries such as Numba support both GPU manufacturers but Nvidia certainly blogs about it more.


                                        • scikit-CUDA provides Python interfaces to many of the functions in the CUDA device/runtime, cuBLAS, cuFFT, and cuSOLVER libraries distributed as part of NVIDIA’s CUDA Programming Toolkit, as well as interfaces to select functions in the CULA Dense Toolkit. Both low-level wrapper functions similar to their C counterparts and high-level functions comparable to those in NumPy and Scipy are provided.


                                        • SuiteSparse libraries for sparse matrix operations on Nvidia GPUs.


                                        • Theano combines aspects of a computer algebra system (CAS) with aspects of an optimizing compiler. It can also generate customized C code for many mathematical operations. This combination of CAS with optimizing compilation is particularly useful for tasks in which complicated mathematical expressions are evaluated repeatedly and evaluation speed is critical. For situations where many different expressions are each evaluated once Theano can minimize the amount of compilation/analysis overhead, but still provide symbolic features such as automatic differentiation.It requires CUDA.


                                        • ViennaCL provides CUDA, OpenCL and OpenMP computing backends. It enables simple, high-level access to the vast computing resources available on parallel architectures such as GPUs and is primarily focused on common sparse and dense linear algebra operations (BLAS levels 1, 2 and 3). It also provides iterative solvers with optional preconditioners for large systems of equations.


                                        A good website for benchmarks for FP64, FP32 and FP16 is Lambda Labs, one article in particular ("Deep Learning GPU Benchmarks - Tesla V100 vs RTX 2080 Ti vs GTX 1080 Ti vs Titan V") offers a great bottom line on what you get, and how much it costs. Don't let the DL slant discourage you, DL can be used for optimization (fast results, not guaranteed to be absolutely optimal) either as a starting point for your variables or a final result. My purpose of mentioning this article is to cite these quotes:




                                        "Results summary



                                        As of February 8, 2019, the NVIDIA RTX 2080 Ti is the best GPU for deep learning research on a single GPU system running TensorFlow. A typical single GPU system with this GPU will be:



                                        • 37% faster than the 1080 Ti with FP32, 62% faster with FP16, and 25% more expensive.

                                        • 35% faster than the 2080 with FP32, 47% faster with FP16, and 25% more expensive.

                                        • 96% as fast as the Titan V with FP32, 3% faster with FP16, and ~1/2 of the cost.

                                        • 80% as fast as the Tesla V100 with FP32, 82% as fast with FP16, and ~1/5 of the cost.

                                        Note that all experiments utilized Tensor Cores when available and are priced out on a complete single GPU system cost. As a system builder and AI research company, we're trying to make benchmarks that are scientific, reproducible, correlate with real world training scenarios, and have accurate prices. So, we've decided to make the spreadsheet that generated our graphs and (performance / $) tables public."



                                        ...



                                        "2080 Ti vs V100 - is the 2080 Ti really that fast?



                                        How can the 2080 Ti be 80% as fast as the Tesla V100, but only 1/8th of the price? The answer is simple: NVIDIA wants to segment the market so that those with high willingness to pay (hyper scalers) only buy their TESLA line of cards which retail for ~$9,800. The RTX and GTX series of cards still offers the best performance per dollar.



                                        If you're not AWS, Azure, or Google Cloud then you're probably much better off buying the 2080 Ti. There are, however, a few key use cases where the V100s can come in handy:



                                        • If you need FP64 compute. If you're doing Computational Fluid Dynamics, n-body simulation, or other work that requires high numerical precision (FP64), then you'll need to buy the Titan V or V100s. If you're not sure if you need FP64, you don't. You would know.

                                        • If you absolutely need 32 GB of memory because your model size won't fit into 11 GB of memory with a batch size of 1. If you are creating your own model architecture and it simply can't fit even when you bring the batch size lower, the V100 could make sense. However, this is a pretty rare edge case. Fewer than 5% of our customers are using custom models. Most use something like ResNet, VGG, Inception, SSD, or Yolo.

                                        So. You're still wondering. Why would anybody buy the V100? It comes down to marketing.



                                        2080 Ti is a Porsche 911, the V100 is a Bugatti Veyron



                                        The V100 is a bit like a Bugatti Veyron. It's one of the fastest street legal cars in the world, ridiculously expensive, and, if you have to ask how much the insurance and maintenance is, you can't afford it. The RTX 2080 Ti, on the other hand, is like a Porsche 911. It's very fast, handles well, expensive but not ostentatious, and with the same amount of money you'd pay for the Bugatti, you can buy the Porsche, a home, a BMW 7-series, send three kids to college, and have money left over for retirement. [Rob's note: costs are different for him compared to my calculations.]



                                        And if you think I'm going overboard with the Porsche analogy, you can buy a DGX-1 8x V100 for $120,000 or a Lambda Blade 8x 2080 Ti for $28,000 and have enough left over for a real Porsche 911. Your pick.".




                                        Thus, you want to pick a GPU manufacturer that provides better benchmarks for the programs you want to run, unless you have the source code and possess some GPU tweaking skills.



                                        The best deal is probably the AMD Radeon VII with it's FP64 rate of 1/4 for only U$700, and even though it's new it's also being discontinued; so there may be some price drops coming. Unfortunately while the hardware is probably a better deal for many people the amount of software available that can wrestle the performance out of it is far fewer and not as developed as what's available for an Nvidia card.




                                        1. Does there exist any mathematical optimization software that can fully exploit multiple modern GPUs?



                                        All of the above links list software that benefits from more GPU cores, even if they are spread across multiple cards, multiple machines or even cloud GPU computing in some cases.



                                        An often quoted article about selecting a GPU and using multiple GPUs is: "Which GPU(s) to Get for Deep Learning: My Experience and Advice for Using GPUs in Deep Learning" (2019-04-03) by Tim Dettmers. While it's focus is on Deep Learning it provides an excellent explanation of the difficulties and performance increase to be expected when using multiple GPUs.



                                        He also says something about the usage of GPUs, in general (where applicable), but again it's in reference to DL (though still applicable to OR optimization):




                                        "Overall I think I still cannot give a clear recommendation for AMD GPUs for ordinary users that just want their GPUs to work smoothly. More experienced users should have fewer problems and by supporting AMD GPUs and ROCm/HIP developers they contribute to the combat against the monopoly position of NVIDIA as this will greatly benefit everyone in the long-term. If you are a GPU developer and want to make important contributions to GPU computing, then an AMD GPU might be the best way to make a good impact over the long-term. For everyone else, NVIDIA GPUs might be the safer choice.".




                                        Articles about using GPUs for Operations Research:



                                        • GPU Computing Applied to Linear and Mixed Integer Programming


                                        • GPU computing in discrete optimization. Part II: Survey focused on routing problems


                                        • gpuMF: a framework for parallel hybrid metaheuristics on GPU with application to the minimisation of harmonics in multilevel inverters


                                        I'll return later to expand this answer.






                                        share|improve this answer












                                        $endgroup$










                                        • 1




                                          $begingroup$
                                          Thanks. The post by Tim Dettmers was ispiring my question.
                                          $endgroup$
                                          – Stefano Gualandi
                                          Jul 22 at 5:36










                                        • $begingroup$
                                          @StefanoGualandi - You are most welcome. I was just on my way back with an addition when I was diverted, but you can check back tomorrow. BTW: How many bits are you planning on using most often (FP64, bfloat, INT2-8, etc.), how much memory (can you afford, over paying for performance). And, what is your overall budget - do you want 4x$500 cards or 2x$3000 cards, for example. Do you want bang for your $, or simply very fast but not most expensive? Please add to your question any additional info you wish to offer and I'll try to address a specific situation.
                                          $endgroup$
                                          – Rob
                                          Jul 22 at 5:45











                                        • $begingroup$
                                          the last two papers you linked are interesting, but I need to find out the time to read them carefully.
                                          $endgroup$
                                          – Stefano Gualandi
                                          Jul 22 at 8:59










                                        • $begingroup$
                                          indeed the last two papers look interesting. I browsed through the first one which is a meta analysis of several other papers that use GPUs for OR problems. The paper is somewhat old (from 2016 with many cited papers from early 201Xs) but even at that time the results at least for exact approaches and simplex look underwhelming: "The authors use randomly generated instances of ATSP with up to 16 cities", "...with 8,000 variables and 2,700 constraints". Often it is not even clear what they compare against.
                                          $endgroup$
                                          – JakobS
                                          Jul 22 at 10:49















                                        6
















                                        $begingroup$


                                        1. Which GPU, if any, should I get for mathematical optimization?



                                        In the case of commercially available software, where no source code is available, you are stuck using the GPU that is better supported by the applications you intend to run.



                                        • AmgX, cuSOLVER and nvGRAPH all require Nvidia GPUs, and offer supporting articles on their blog.


                                        • Cusp is a library for sparse linear algebra and graph computations based on Thrust. Cusp provides a flexible, high-level interface for manipulating sparse matrices and solving sparse linear systems. It is written to use CUDA.



                                        • Hyperlearn requires CUDA. Offers GPU acceleration of:



                                          • Matrix Completion algorithms - Non Negative Least Squares, NNMF

                                          • Batch Similarity Latent Dirichelt Allocation (BS-LDA)

                                          • Correlation Regression

                                          • Feasible Generalized Least Squares FGLS

                                          • Outlier Tolerant Regression

                                          • Multidimensional Spline Regression

                                          • Generalized MICE (any model drop in replacement)

                                          • Using Uber's Pyro for Bayesian Deep Learning


                                        • Matlab only supports GPU acceleration on Nvidia GPUs when using the Parallel Computing Toolbox, otherwise any graphics card supporting OpenGL 3.3 with 1GB GPU memory is recommended.


                                        • Pagmo2 supports both Nvidia and AMD GPU acceleration. Pagmo (C++) or pygmo (Python) is a scientific library for massively parallel optimization. It is built around the idea of providing a unified interface to optimization algorithms and to optimization problems and to make their deployment in massively parallel environments easy. A short list of some papers from the European Space Agency where pagmo was utilized.


                                        • Python has a number of libraries that support CUDA, but not as much support for AMD GPUs and OpenCL, some libraries such as Numba support both GPU manufacturers but Nvidia certainly blogs about it more.


                                        • scikit-CUDA provides Python interfaces to many of the functions in the CUDA device/runtime, cuBLAS, cuFFT, and cuSOLVER libraries distributed as part of NVIDIA’s CUDA Programming Toolkit, as well as interfaces to select functions in the CULA Dense Toolkit. Both low-level wrapper functions similar to their C counterparts and high-level functions comparable to those in NumPy and Scipy are provided.


                                        • SuiteSparse libraries for sparse matrix operations on Nvidia GPUs.


                                        • Theano combines aspects of a computer algebra system (CAS) with aspects of an optimizing compiler. It can also generate customized C code for many mathematical operations. This combination of CAS with optimizing compilation is particularly useful for tasks in which complicated mathematical expressions are evaluated repeatedly and evaluation speed is critical. For situations where many different expressions are each evaluated once Theano can minimize the amount of compilation/analysis overhead, but still provide symbolic features such as automatic differentiation.It requires CUDA.


                                        • ViennaCL provides CUDA, OpenCL and OpenMP computing backends. It enables simple, high-level access to the vast computing resources available on parallel architectures such as GPUs and is primarily focused on common sparse and dense linear algebra operations (BLAS levels 1, 2 and 3). It also provides iterative solvers with optional preconditioners for large systems of equations.


                                        A good website for benchmarks for FP64, FP32 and FP16 is Lambda Labs, one article in particular ("Deep Learning GPU Benchmarks - Tesla V100 vs RTX 2080 Ti vs GTX 1080 Ti vs Titan V") offers a great bottom line on what you get, and how much it costs. Don't let the DL slant discourage you, DL can be used for optimization (fast results, not guaranteed to be absolutely optimal) either as a starting point for your variables or a final result. My purpose of mentioning this article is to cite these quotes:




                                        "Results summary



                                        As of February 8, 2019, the NVIDIA RTX 2080 Ti is the best GPU for deep learning research on a single GPU system running TensorFlow. A typical single GPU system with this GPU will be:



                                        • 37% faster than the 1080 Ti with FP32, 62% faster with FP16, and 25% more expensive.

                                        • 35% faster than the 2080 with FP32, 47% faster with FP16, and 25% more expensive.

                                        • 96% as fast as the Titan V with FP32, 3% faster with FP16, and ~1/2 of the cost.

                                        • 80% as fast as the Tesla V100 with FP32, 82% as fast with FP16, and ~1/5 of the cost.

                                        Note that all experiments utilized Tensor Cores when available and are priced out on a complete single GPU system cost. As a system builder and AI research company, we're trying to make benchmarks that are scientific, reproducible, correlate with real world training scenarios, and have accurate prices. So, we've decided to make the spreadsheet that generated our graphs and (performance / $) tables public."



                                        ...



                                        "2080 Ti vs V100 - is the 2080 Ti really that fast?



                                        How can the 2080 Ti be 80% as fast as the Tesla V100, but only 1/8th of the price? The answer is simple: NVIDIA wants to segment the market so that those with high willingness to pay (hyper scalers) only buy their TESLA line of cards which retail for ~$9,800. The RTX and GTX series of cards still offers the best performance per dollar.



                                        If you're not AWS, Azure, or Google Cloud then you're probably much better off buying the 2080 Ti. There are, however, a few key use cases where the V100s can come in handy:



                                        • If you need FP64 compute. If you're doing Computational Fluid Dynamics, n-body simulation, or other work that requires high numerical precision (FP64), then you'll need to buy the Titan V or V100s. If you're not sure if you need FP64, you don't. You would know.

                                        • If you absolutely need 32 GB of memory because your model size won't fit into 11 GB of memory with a batch size of 1. If you are creating your own model architecture and it simply can't fit even when you bring the batch size lower, the V100 could make sense. However, this is a pretty rare edge case. Fewer than 5% of our customers are using custom models. Most use something like ResNet, VGG, Inception, SSD, or Yolo.

                                        So. You're still wondering. Why would anybody buy the V100? It comes down to marketing.



                                        2080 Ti is a Porsche 911, the V100 is a Bugatti Veyron



                                        The V100 is a bit like a Bugatti Veyron. It's one of the fastest street legal cars in the world, ridiculously expensive, and, if you have to ask how much the insurance and maintenance is, you can't afford it. The RTX 2080 Ti, on the other hand, is like a Porsche 911. It's very fast, handles well, expensive but not ostentatious, and with the same amount of money you'd pay for the Bugatti, you can buy the Porsche, a home, a BMW 7-series, send three kids to college, and have money left over for retirement. [Rob's note: costs are different for him compared to my calculations.]



                                        And if you think I'm going overboard with the Porsche analogy, you can buy a DGX-1 8x V100 for $120,000 or a Lambda Blade 8x 2080 Ti for $28,000 and have enough left over for a real Porsche 911. Your pick.".




                                        Thus, you want to pick a GPU manufacturer that provides better benchmarks for the programs you want to run, unless you have the source code and possess some GPU tweaking skills.



                                        The best deal is probably the AMD Radeon VII with it's FP64 rate of 1/4 for only U$700, and even though it's new it's also being discontinued; so there may be some price drops coming. Unfortunately while the hardware is probably a better deal for many people the amount of software available that can wrestle the performance out of it is far fewer and not as developed as what's available for an Nvidia card.




                                        1. Does there exist any mathematical optimization software that can fully exploit multiple modern GPUs?



                                        All of the above links list software that benefits from more GPU cores, even if they are spread across multiple cards, multiple machines or even cloud GPU computing in some cases.



                                        An often quoted article about selecting a GPU and using multiple GPUs is: "Which GPU(s) to Get for Deep Learning: My Experience and Advice for Using GPUs in Deep Learning" (2019-04-03) by Tim Dettmers. While it's focus is on Deep Learning it provides an excellent explanation of the difficulties and performance increase to be expected when using multiple GPUs.



                                        He also says something about the usage of GPUs, in general (where applicable), but again it's in reference to DL (though still applicable to OR optimization):




                                        "Overall I think I still cannot give a clear recommendation for AMD GPUs for ordinary users that just want their GPUs to work smoothly. More experienced users should have fewer problems and by supporting AMD GPUs and ROCm/HIP developers they contribute to the combat against the monopoly position of NVIDIA as this will greatly benefit everyone in the long-term. If you are a GPU developer and want to make important contributions to GPU computing, then an AMD GPU might be the best way to make a good impact over the long-term. For everyone else, NVIDIA GPUs might be the safer choice.".




                                        Articles about using GPUs for Operations Research:



                                        • GPU Computing Applied to Linear and Mixed Integer Programming


                                        • GPU computing in discrete optimization. Part II: Survey focused on routing problems


                                        • gpuMF: a framework for parallel hybrid metaheuristics on GPU with application to the minimisation of harmonics in multilevel inverters


                                        I'll return later to expand this answer.






                                        share|improve this answer












                                        $endgroup$










                                        • 1




                                          $begingroup$
                                          Thanks. The post by Tim Dettmers was ispiring my question.
                                          $endgroup$
                                          – Stefano Gualandi
                                          Jul 22 at 5:36










                                        • $begingroup$
                                          @StefanoGualandi - You are most welcome. I was just on my way back with an addition when I was diverted, but you can check back tomorrow. BTW: How many bits are you planning on using most often (FP64, bfloat, INT2-8, etc.), how much memory (can you afford, over paying for performance). And, what is your overall budget - do you want 4x$500 cards or 2x$3000 cards, for example. Do you want bang for your $, or simply very fast but not most expensive? Please add to your question any additional info you wish to offer and I'll try to address a specific situation.
                                          $endgroup$
                                          – Rob
                                          Jul 22 at 5:45











                                        • $begingroup$
                                          the last two papers you linked are interesting, but I need to find out the time to read them carefully.
                                          $endgroup$
                                          – Stefano Gualandi
                                          Jul 22 at 8:59










                                        • $begingroup$
                                          indeed the last two papers look interesting. I browsed through the first one which is a meta analysis of several other papers that use GPUs for OR problems. The paper is somewhat old (from 2016 with many cited papers from early 201Xs) but even at that time the results at least for exact approaches and simplex look underwhelming: "The authors use randomly generated instances of ATSP with up to 16 cities", "...with 8,000 variables and 2,700 constraints". Often it is not even clear what they compare against.
                                          $endgroup$
                                          – JakobS
                                          Jul 22 at 10:49













                                        6














                                        6










                                        6







                                        $begingroup$


                                        1. Which GPU, if any, should I get for mathematical optimization?



                                        In the case of commercially available software, where no source code is available, you are stuck using the GPU that is better supported by the applications you intend to run.



                                        • AmgX, cuSOLVER and nvGRAPH all require Nvidia GPUs, and offer supporting articles on their blog.


                                        • Cusp is a library for sparse linear algebra and graph computations based on Thrust. Cusp provides a flexible, high-level interface for manipulating sparse matrices and solving sparse linear systems. It is written to use CUDA.



                                        • Hyperlearn requires CUDA. Offers GPU acceleration of:



                                          • Matrix Completion algorithms - Non Negative Least Squares, NNMF

                                          • Batch Similarity Latent Dirichelt Allocation (BS-LDA)

                                          • Correlation Regression

                                          • Feasible Generalized Least Squares FGLS

                                          • Outlier Tolerant Regression

                                          • Multidimensional Spline Regression

                                          • Generalized MICE (any model drop in replacement)

                                          • Using Uber's Pyro for Bayesian Deep Learning


                                        • Matlab only supports GPU acceleration on Nvidia GPUs when using the Parallel Computing Toolbox, otherwise any graphics card supporting OpenGL 3.3 with 1GB GPU memory is recommended.


                                        • Pagmo2 supports both Nvidia and AMD GPU acceleration. Pagmo (C++) or pygmo (Python) is a scientific library for massively parallel optimization. It is built around the idea of providing a unified interface to optimization algorithms and to optimization problems and to make their deployment in massively parallel environments easy. A short list of some papers from the European Space Agency where pagmo was utilized.


                                        • Python has a number of libraries that support CUDA, but not as much support for AMD GPUs and OpenCL, some libraries such as Numba support both GPU manufacturers but Nvidia certainly blogs about it more.


                                        • scikit-CUDA provides Python interfaces to many of the functions in the CUDA device/runtime, cuBLAS, cuFFT, and cuSOLVER libraries distributed as part of NVIDIA’s CUDA Programming Toolkit, as well as interfaces to select functions in the CULA Dense Toolkit. Both low-level wrapper functions similar to their C counterparts and high-level functions comparable to those in NumPy and Scipy are provided.


                                        • SuiteSparse libraries for sparse matrix operations on Nvidia GPUs.


                                        • Theano combines aspects of a computer algebra system (CAS) with aspects of an optimizing compiler. It can also generate customized C code for many mathematical operations. This combination of CAS with optimizing compilation is particularly useful for tasks in which complicated mathematical expressions are evaluated repeatedly and evaluation speed is critical. For situations where many different expressions are each evaluated once Theano can minimize the amount of compilation/analysis overhead, but still provide symbolic features such as automatic differentiation.It requires CUDA.


                                        • ViennaCL provides CUDA, OpenCL and OpenMP computing backends. It enables simple, high-level access to the vast computing resources available on parallel architectures such as GPUs and is primarily focused on common sparse and dense linear algebra operations (BLAS levels 1, 2 and 3). It also provides iterative solvers with optional preconditioners for large systems of equations.


                                        A good website for benchmarks for FP64, FP32 and FP16 is Lambda Labs, one article in particular ("Deep Learning GPU Benchmarks - Tesla V100 vs RTX 2080 Ti vs GTX 1080 Ti vs Titan V") offers a great bottom line on what you get, and how much it costs. Don't let the DL slant discourage you, DL can be used for optimization (fast results, not guaranteed to be absolutely optimal) either as a starting point for your variables or a final result. My purpose of mentioning this article is to cite these quotes:




                                        "Results summary



                                        As of February 8, 2019, the NVIDIA RTX 2080 Ti is the best GPU for deep learning research on a single GPU system running TensorFlow. A typical single GPU system with this GPU will be:



                                        • 37% faster than the 1080 Ti with FP32, 62% faster with FP16, and 25% more expensive.

                                        • 35% faster than the 2080 with FP32, 47% faster with FP16, and 25% more expensive.

                                        • 96% as fast as the Titan V with FP32, 3% faster with FP16, and ~1/2 of the cost.

                                        • 80% as fast as the Tesla V100 with FP32, 82% as fast with FP16, and ~1/5 of the cost.

                                        Note that all experiments utilized Tensor Cores when available and are priced out on a complete single GPU system cost. As a system builder and AI research company, we're trying to make benchmarks that are scientific, reproducible, correlate with real world training scenarios, and have accurate prices. So, we've decided to make the spreadsheet that generated our graphs and (performance / $) tables public."



                                        ...



                                        "2080 Ti vs V100 - is the 2080 Ti really that fast?



                                        How can the 2080 Ti be 80% as fast as the Tesla V100, but only 1/8th of the price? The answer is simple: NVIDIA wants to segment the market so that those with high willingness to pay (hyper scalers) only buy their TESLA line of cards which retail for ~$9,800. The RTX and GTX series of cards still offers the best performance per dollar.



                                        If you're not AWS, Azure, or Google Cloud then you're probably much better off buying the 2080 Ti. There are, however, a few key use cases where the V100s can come in handy:



                                        • If you need FP64 compute. If you're doing Computational Fluid Dynamics, n-body simulation, or other work that requires high numerical precision (FP64), then you'll need to buy the Titan V or V100s. If you're not sure if you need FP64, you don't. You would know.

                                        • If you absolutely need 32 GB of memory because your model size won't fit into 11 GB of memory with a batch size of 1. If you are creating your own model architecture and it simply can't fit even when you bring the batch size lower, the V100 could make sense. However, this is a pretty rare edge case. Fewer than 5% of our customers are using custom models. Most use something like ResNet, VGG, Inception, SSD, or Yolo.

                                        So. You're still wondering. Why would anybody buy the V100? It comes down to marketing.



                                        2080 Ti is a Porsche 911, the V100 is a Bugatti Veyron



                                        The V100 is a bit like a Bugatti Veyron. It's one of the fastest street legal cars in the world, ridiculously expensive, and, if you have to ask how much the insurance and maintenance is, you can't afford it. The RTX 2080 Ti, on the other hand, is like a Porsche 911. It's very fast, handles well, expensive but not ostentatious, and with the same amount of money you'd pay for the Bugatti, you can buy the Porsche, a home, a BMW 7-series, send three kids to college, and have money left over for retirement. [Rob's note: costs are different for him compared to my calculations.]



                                        And if you think I'm going overboard with the Porsche analogy, you can buy a DGX-1 8x V100 for $120,000 or a Lambda Blade 8x 2080 Ti for $28,000 and have enough left over for a real Porsche 911. Your pick.".




                                        Thus, you want to pick a GPU manufacturer that provides better benchmarks for the programs you want to run, unless you have the source code and possess some GPU tweaking skills.



                                        The best deal is probably the AMD Radeon VII with it's FP64 rate of 1/4 for only U$700, and even though it's new it's also being discontinued; so there may be some price drops coming. Unfortunately while the hardware is probably a better deal for many people the amount of software available that can wrestle the performance out of it is far fewer and not as developed as what's available for an Nvidia card.




                                        1. Does there exist any mathematical optimization software that can fully exploit multiple modern GPUs?



                                        All of the above links list software that benefits from more GPU cores, even if they are spread across multiple cards, multiple machines or even cloud GPU computing in some cases.



                                        An often quoted article about selecting a GPU and using multiple GPUs is: "Which GPU(s) to Get for Deep Learning: My Experience and Advice for Using GPUs in Deep Learning" (2019-04-03) by Tim Dettmers. While it's focus is on Deep Learning it provides an excellent explanation of the difficulties and performance increase to be expected when using multiple GPUs.



                                        He also says something about the usage of GPUs, in general (where applicable), but again it's in reference to DL (though still applicable to OR optimization):




                                        "Overall I think I still cannot give a clear recommendation for AMD GPUs for ordinary users that just want their GPUs to work smoothly. More experienced users should have fewer problems and by supporting AMD GPUs and ROCm/HIP developers they contribute to the combat against the monopoly position of NVIDIA as this will greatly benefit everyone in the long-term. If you are a GPU developer and want to make important contributions to GPU computing, then an AMD GPU might be the best way to make a good impact over the long-term. For everyone else, NVIDIA GPUs might be the safer choice.".




                                        Articles about using GPUs for Operations Research:



                                        • GPU Computing Applied to Linear and Mixed Integer Programming


                                        • GPU computing in discrete optimization. Part II: Survey focused on routing problems


                                        • gpuMF: a framework for parallel hybrid metaheuristics on GPU with application to the minimisation of harmonics in multilevel inverters


                                        I'll return later to expand this answer.






                                        share|improve this answer












                                        $endgroup$




                                        1. Which GPU, if any, should I get for mathematical optimization?



                                        In the case of commercially available software, where no source code is available, you are stuck using the GPU that is better supported by the applications you intend to run.



                                        • AmgX, cuSOLVER and nvGRAPH all require Nvidia GPUs, and offer supporting articles on their blog.


                                        • Cusp is a library for sparse linear algebra and graph computations based on Thrust. Cusp provides a flexible, high-level interface for manipulating sparse matrices and solving sparse linear systems. It is written to use CUDA.



                                        • Hyperlearn requires CUDA. Offers GPU acceleration of:



                                          • Matrix Completion algorithms - Non Negative Least Squares, NNMF

                                          • Batch Similarity Latent Dirichelt Allocation (BS-LDA)

                                          • Correlation Regression

                                          • Feasible Generalized Least Squares FGLS

                                          • Outlier Tolerant Regression

                                          • Multidimensional Spline Regression

                                          • Generalized MICE (any model drop in replacement)

                                          • Using Uber's Pyro for Bayesian Deep Learning


                                        • Matlab only supports GPU acceleration on Nvidia GPUs when using the Parallel Computing Toolbox, otherwise any graphics card supporting OpenGL 3.3 with 1GB GPU memory is recommended.


                                        • Pagmo2 supports both Nvidia and AMD GPU acceleration. Pagmo (C++) or pygmo (Python) is a scientific library for massively parallel optimization. It is built around the idea of providing a unified interface to optimization algorithms and to optimization problems and to make their deployment in massively parallel environments easy. A short list of some papers from the European Space Agency where pagmo was utilized.


                                        • Python has a number of libraries that support CUDA, but not as much support for AMD GPUs and OpenCL, some libraries such as Numba support both GPU manufacturers but Nvidia certainly blogs about it more.


                                        • scikit-CUDA provides Python interfaces to many of the functions in the CUDA device/runtime, cuBLAS, cuFFT, and cuSOLVER libraries distributed as part of NVIDIA’s CUDA Programming Toolkit, as well as interfaces to select functions in the CULA Dense Toolkit. Both low-level wrapper functions similar to their C counterparts and high-level functions comparable to those in NumPy and Scipy are provided.


                                        • SuiteSparse libraries for sparse matrix operations on Nvidia GPUs.


                                        • Theano combines aspects of a computer algebra system (CAS) with aspects of an optimizing compiler. It can also generate customized C code for many mathematical operations. This combination of CAS with optimizing compilation is particularly useful for tasks in which complicated mathematical expressions are evaluated repeatedly and evaluation speed is critical. For situations where many different expressions are each evaluated once Theano can minimize the amount of compilation/analysis overhead, but still provide symbolic features such as automatic differentiation.It requires CUDA.


                                        • ViennaCL provides CUDA, OpenCL and OpenMP computing backends. It enables simple, high-level access to the vast computing resources available on parallel architectures such as GPUs and is primarily focused on common sparse and dense linear algebra operations (BLAS levels 1, 2 and 3). It also provides iterative solvers with optional preconditioners for large systems of equations.


                                        A good website for benchmarks for FP64, FP32 and FP16 is Lambda Labs, one article in particular ("Deep Learning GPU Benchmarks - Tesla V100 vs RTX 2080 Ti vs GTX 1080 Ti vs Titan V") offers a great bottom line on what you get, and how much it costs. Don't let the DL slant discourage you, DL can be used for optimization (fast results, not guaranteed to be absolutely optimal) either as a starting point for your variables or a final result. My purpose of mentioning this article is to cite these quotes:




                                        "Results summary



                                        As of February 8, 2019, the NVIDIA RTX 2080 Ti is the best GPU for deep learning research on a single GPU system running TensorFlow. A typical single GPU system with this GPU will be:



                                        • 37% faster than the 1080 Ti with FP32, 62% faster with FP16, and 25% more expensive.

                                        • 35% faster than the 2080 with FP32, 47% faster with FP16, and 25% more expensive.

                                        • 96% as fast as the Titan V with FP32, 3% faster with FP16, and ~1/2 of the cost.

                                        • 80% as fast as the Tesla V100 with FP32, 82% as fast with FP16, and ~1/5 of the cost.

                                        Note that all experiments utilized Tensor Cores when available and are priced out on a complete single GPU system cost. As a system builder and AI research company, we're trying to make benchmarks that are scientific, reproducible, correlate with real world training scenarios, and have accurate prices. So, we've decided to make the spreadsheet that generated our graphs and (performance / $) tables public."



                                        ...



                                        "2080 Ti vs V100 - is the 2080 Ti really that fast?



                                        How can the 2080 Ti be 80% as fast as the Tesla V100, but only 1/8th of the price? The answer is simple: NVIDIA wants to segment the market so that those with high willingness to pay (hyper scalers) only buy their TESLA line of cards which retail for ~$9,800. The RTX and GTX series of cards still offers the best performance per dollar.



                                        If you're not AWS, Azure, or Google Cloud then you're probably much better off buying the 2080 Ti. There are, however, a few key use cases where the V100s can come in handy:



                                        • If you need FP64 compute. If you're doing Computational Fluid Dynamics, n-body simulation, or other work that requires high numerical precision (FP64), then you'll need to buy the Titan V or V100s. If you're not sure if you need FP64, you don't. You would know.

                                        • If you absolutely need 32 GB of memory because your model size won't fit into 11 GB of memory with a batch size of 1. If you are creating your own model architecture and it simply can't fit even when you bring the batch size lower, the V100 could make sense. However, this is a pretty rare edge case. Fewer than 5% of our customers are using custom models. Most use something like ResNet, VGG, Inception, SSD, or Yolo.

                                        So. You're still wondering. Why would anybody buy the V100? It comes down to marketing.



                                        2080 Ti is a Porsche 911, the V100 is a Bugatti Veyron



                                        The V100 is a bit like a Bugatti Veyron. It's one of the fastest street legal cars in the world, ridiculously expensive, and, if you have to ask how much the insurance and maintenance is, you can't afford it. The RTX 2080 Ti, on the other hand, is like a Porsche 911. It's very fast, handles well, expensive but not ostentatious, and with the same amount of money you'd pay for the Bugatti, you can buy the Porsche, a home, a BMW 7-series, send three kids to college, and have money left over for retirement. [Rob's note: costs are different for him compared to my calculations.]



                                        And if you think I'm going overboard with the Porsche analogy, you can buy a DGX-1 8x V100 for $120,000 or a Lambda Blade 8x 2080 Ti for $28,000 and have enough left over for a real Porsche 911. Your pick.".




                                        Thus, you want to pick a GPU manufacturer that provides better benchmarks for the programs you want to run, unless you have the source code and possess some GPU tweaking skills.



                                        The best deal is probably the AMD Radeon VII with it's FP64 rate of 1/4 for only U$700, and even though it's new it's also being discontinued; so there may be some price drops coming. Unfortunately while the hardware is probably a better deal for many people the amount of software available that can wrestle the performance out of it is far fewer and not as developed as what's available for an Nvidia card.




                                        1. Does there exist any mathematical optimization software that can fully exploit multiple modern GPUs?



                                        All of the above links list software that benefits from more GPU cores, even if they are spread across multiple cards, multiple machines or even cloud GPU computing in some cases.



                                        An often quoted article about selecting a GPU and using multiple GPUs is: "Which GPU(s) to Get for Deep Learning: My Experience and Advice for Using GPUs in Deep Learning" (2019-04-03) by Tim Dettmers. While it's focus is on Deep Learning it provides an excellent explanation of the difficulties and performance increase to be expected when using multiple GPUs.



                                        He also says something about the usage of GPUs, in general (where applicable), but again it's in reference to DL (though still applicable to OR optimization):




                                        "Overall I think I still cannot give a clear recommendation for AMD GPUs for ordinary users that just want their GPUs to work smoothly. More experienced users should have fewer problems and by supporting AMD GPUs and ROCm/HIP developers they contribute to the combat against the monopoly position of NVIDIA as this will greatly benefit everyone in the long-term. If you are a GPU developer and want to make important contributions to GPU computing, then an AMD GPU might be the best way to make a good impact over the long-term. For everyone else, NVIDIA GPUs might be the safer choice.".




                                        Articles about using GPUs for Operations Research:



                                        • GPU Computing Applied to Linear and Mixed Integer Programming


                                        • GPU computing in discrete optimization. Part II: Survey focused on routing problems


                                        • gpuMF: a framework for parallel hybrid metaheuristics on GPU with application to the minimisation of harmonics in multilevel inverters


                                        I'll return later to expand this answer.







                                        share|improve this answer















                                        share|improve this answer




                                        share|improve this answer








                                        edited Jul 22 at 18:41

























                                        answered Jul 22 at 4:21









                                        RobRob

                                        1,6311 gold badge6 silver badges27 bronze badges




                                        1,6311 gold badge6 silver badges27 bronze badges










                                        • 1




                                          $begingroup$
                                          Thanks. The post by Tim Dettmers was ispiring my question.
                                          $endgroup$
                                          – Stefano Gualandi
                                          Jul 22 at 5:36










                                        • $begingroup$
                                          @StefanoGualandi - You are most welcome. I was just on my way back with an addition when I was diverted, but you can check back tomorrow. BTW: How many bits are you planning on using most often (FP64, bfloat, INT2-8, etc.), how much memory (can you afford, over paying for performance). And, what is your overall budget - do you want 4x$500 cards or 2x$3000 cards, for example. Do you want bang for your $, or simply very fast but not most expensive? Please add to your question any additional info you wish to offer and I'll try to address a specific situation.
                                          $endgroup$
                                          – Rob
                                          Jul 22 at 5:45











                                        • $begingroup$
                                          the last two papers you linked are interesting, but I need to find out the time to read them carefully.
                                          $endgroup$
                                          – Stefano Gualandi
                                          Jul 22 at 8:59










                                        • $begingroup$
                                          indeed the last two papers look interesting. I browsed through the first one which is a meta analysis of several other papers that use GPUs for OR problems. The paper is somewhat old (from 2016 with many cited papers from early 201Xs) but even at that time the results at least for exact approaches and simplex look underwhelming: "The authors use randomly generated instances of ATSP with up to 16 cities", "...with 8,000 variables and 2,700 constraints". Often it is not even clear what they compare against.
                                          $endgroup$
                                          – JakobS
                                          Jul 22 at 10:49












                                        • 1




                                          $begingroup$
                                          Thanks. The post by Tim Dettmers was ispiring my question.
                                          $endgroup$
                                          – Stefano Gualandi
                                          Jul 22 at 5:36










                                        • $begingroup$
                                          @StefanoGualandi - You are most welcome. I was just on my way back with an addition when I was diverted, but you can check back tomorrow. BTW: How many bits are you planning on using most often (FP64, bfloat, INT2-8, etc.), how much memory (can you afford, over paying for performance). And, what is your overall budget - do you want 4x$500 cards or 2x$3000 cards, for example. Do you want bang for your $, or simply very fast but not most expensive? Please add to your question any additional info you wish to offer and I'll try to address a specific situation.
                                          $endgroup$
                                          – Rob
                                          Jul 22 at 5:45











                                        • $begingroup$
                                          the last two papers you linked are interesting, but I need to find out the time to read them carefully.
                                          $endgroup$
                                          – Stefano Gualandi
                                          Jul 22 at 8:59










                                        • $begingroup$
                                          indeed the last two papers look interesting. I browsed through the first one which is a meta analysis of several other papers that use GPUs for OR problems. The paper is somewhat old (from 2016 with many cited papers from early 201Xs) but even at that time the results at least for exact approaches and simplex look underwhelming: "The authors use randomly generated instances of ATSP with up to 16 cities", "...with 8,000 variables and 2,700 constraints". Often it is not even clear what they compare against.
                                          $endgroup$
                                          – JakobS
                                          Jul 22 at 10:49







                                        1




                                        1




                                        $begingroup$
                                        Thanks. The post by Tim Dettmers was ispiring my question.
                                        $endgroup$
                                        – Stefano Gualandi
                                        Jul 22 at 5:36




                                        $begingroup$
                                        Thanks. The post by Tim Dettmers was ispiring my question.
                                        $endgroup$
                                        – Stefano Gualandi
                                        Jul 22 at 5:36












                                        $begingroup$
                                        @StefanoGualandi - You are most welcome. I was just on my way back with an addition when I was diverted, but you can check back tomorrow. BTW: How many bits are you planning on using most often (FP64, bfloat, INT2-8, etc.), how much memory (can you afford, over paying for performance). And, what is your overall budget - do you want 4x$500 cards or 2x$3000 cards, for example. Do you want bang for your $, or simply very fast but not most expensive? Please add to your question any additional info you wish to offer and I'll try to address a specific situation.
                                        $endgroup$
                                        – Rob
                                        Jul 22 at 5:45





                                        $begingroup$
                                        @StefanoGualandi - You are most welcome. I was just on my way back with an addition when I was diverted, but you can check back tomorrow. BTW: How many bits are you planning on using most often (FP64, bfloat, INT2-8, etc.), how much memory (can you afford, over paying for performance). And, what is your overall budget - do you want 4x$500 cards or 2x$3000 cards, for example. Do you want bang for your $, or simply very fast but not most expensive? Please add to your question any additional info you wish to offer and I'll try to address a specific situation.
                                        $endgroup$
                                        – Rob
                                        Jul 22 at 5:45













                                        $begingroup$
                                        the last two papers you linked are interesting, but I need to find out the time to read them carefully.
                                        $endgroup$
                                        – Stefano Gualandi
                                        Jul 22 at 8:59




                                        $begingroup$
                                        the last two papers you linked are interesting, but I need to find out the time to read them carefully.
                                        $endgroup$
                                        – Stefano Gualandi
                                        Jul 22 at 8:59












                                        $begingroup$
                                        indeed the last two papers look interesting. I browsed through the first one which is a meta analysis of several other papers that use GPUs for OR problems. The paper is somewhat old (from 2016 with many cited papers from early 201Xs) but even at that time the results at least for exact approaches and simplex look underwhelming: "The authors use randomly generated instances of ATSP with up to 16 cities", "...with 8,000 variables and 2,700 constraints". Often it is not even clear what they compare against.
                                        $endgroup$
                                        – JakobS
                                        Jul 22 at 10:49




                                        $begingroup$
                                        indeed the last two papers look interesting. I browsed through the first one which is a meta analysis of several other papers that use GPUs for OR problems. The paper is somewhat old (from 2016 with many cited papers from early 201Xs) but even at that time the results at least for exact approaches and simplex look underwhelming: "The authors use randomly generated instances of ATSP with up to 16 cities", "...with 8,000 variables and 2,700 constraints". Often it is not even clear what they compare against.
                                        $endgroup$
                                        – JakobS
                                        Jul 22 at 10:49


















                                        draft saved

                                        draft discarded















































                                        Thanks for contributing an answer to Operations Research Stack Exchange!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid


                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.

                                        Use MathJax to format equations. MathJax reference.


                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2for.stackexchange.com%2fquestions%2f1024%2fwhich-gpus-to-get-for-mathematical-optimization-if-any%23new-answer', 'question_page');

                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown









                                        Popular posts from this blog

                                        Tamil (spriik) Luke uk diar | Nawigatjuun

                                        Align equal signs while including text over equalitiesAMS align: left aligned text/math plus multicolumn alignmentMultiple alignmentsAligning equations in multiple placesNumbering and aligning an equation with multiple columnsHow to align one equation with another multline equationUsing \ in environments inside the begintabularxNumber equations and preserving alignment of equal signsHow can I align equations to the left and to the right?Double equation alignment problem within align enviromentAligned within align: Why are they right-aligned?

                                        Where does the image of a data connector as a sharp metal spike originate from?Where does the concept of infected people turning into zombies only after death originate from?Where does the motif of a reanimated human head originate?Where did the notion that Dragons could speak originate?Where does the archetypal image of the 'Grey' alien come from?Where did the suffix '-Man' originate?Where does the notion of being injured or killed by an illusion originate?Where did the term “sophont” originate?Where does the trope of magic spells being driven by advanced technology originate from?Where did the term “the living impaired” originate?