Performance gap between vector and arrayIs std::vector so much slower than plain arrays?Why aren't variable-length arrays part of the C++ standard?Using arrays or std::vectors in C++, what's the performance gap?Create ArrayList from arrayHow do I check if an array includes an object in JavaScript?How to append something to an array?Improve INSERT-per-second performance of SQLite?What is the difference between call and apply?Loop through an array in JavaScriptHow do I remove a particular element from an array in JavaScript?For-each over an array in JavaScript?Why is processing a sorted array faster than processing an unsorted array?Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviations

What is joint estimation?

What are the consequences for downstream actors of redistributing a work under a wider CC license than the copyright holder authorized?

Can the bass be used instead of drums?

Is self-defense mutually exclusive of murder?

Why are seats at the rear of a plane sometimes unavailable even though many other seats are available in the plane?

What can I do to avoid potential charges for bribery?

Type and strength of a typical chain

Is consistent disregard for students' time "normal" in undergraduate research?

An employee has low self-confidence, and is performing poorly. How can I help?

Can I use I2C over 2m cables?

Dedicated solver for convex problems

Match the blocks

Suspicious crontab entry running 'xribfa4' every 15 minutes

Why is matter-antimatter asymmetry surprising, if asymmetry can be generated by a random walk in which particles go into black holes?

A demigod among men

Why did the range based for loop specification change in C++17

How do you translate "Don't Fear the Reaper" into Latin?

Can you decide not to sneak into a room after seeing your roll?

What good is Divine Sense?

Why is CMYK & PNG not possible?

Does Windows 10 Fast Startup feature drain battery while laptop is turned off?

My name was added to manuscript as co-author without my consent; how to get it removed?

Would Anti-Magic Zone Affect Dragon Breath?

What is a terminal plane in high frequency circuits?



Performance gap between vector and array


Is std::vector so much slower than plain arrays?Why aren't variable-length arrays part of the C++ standard?Using arrays or std::vectors in C++, what's the performance gap?Create ArrayList from arrayHow do I check if an array includes an object in JavaScript?How to append something to an array?Improve INSERT-per-second performance of SQLite?What is the difference between call and apply?Loop through an array in JavaScriptHow do I remove a particular element from an array in JavaScript?For-each over an array in JavaScript?Why is processing a sorted array faster than processing an unsorted array?Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviations






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;









30

















I was trying to solve a coding problem in C++ which counts the number of prime numbers less than a non-negative number n.



So I first came up with some code:



int countPrimes(int n) 
vector<bool> flag(n+1,1);
for(int i =2;i<n;i++)

if(flag[i]==1)
for(long j=i;i*j<n;j++)
flag[i*j]=0;

int result=0;
for(int i =2;i<n;i++)
result+=flag[i];
return result;



which takes 88 ms and uses 8.6 MB of memory. Then I changed my code into:



int countPrimes(int n) 
// vector<bool> flag(n+1,1);
bool flag[n+1] ;
fill(flag,flag+n+1,true);
for(int i =2;i<n;i++)

if(flag[i]==1)
for(long j=i;i*j<n;j++)
flag[i*j]=0;

int result=0;
for(int i =2;i<n;i++)
result+=flag[i];
return result;



which takes 28 ms and 9.9 MB. I don't really understand why there is such a performance gap in both the running time and memory consumption. I have read relative questions like this one and that one but I am still confused.



EDIT: I reduced the running time to 40 ms with 11.5 MB of memory after replacing vector<bool> with vector<char>.










share|improve this question























  • 6





    How are you compiling your code? What compiler options? Also; vector<bool> is special.

    – Jesper Juhl
    Apr 18 at 11:45







  • 13





    be carfull, std::vector<bool> is a weird specialisation, more a bitset than a vector

    – Martin Morterol
    Apr 18 at 11:49






  • 3





    you may gain some time changing one loop a bit: for(int i = 2; i * i <n;i++) since if i * i >= n then next loop does nothing.

    – Marek R
    Apr 18 at 12:20







  • 3





    This doesn't address the question, but when you're dealing with boolean types, use true and false and not 1. So: vector<bool> flag(n+1, true); and if (flag[i]). That doesn't affect the result, but it makes it much clearer what you're doing.

    – Pete Becker
    Apr 18 at 12:53






  • 4





    If you are not already doing so, make sure to compile your code with compiler optimizations enabled before benchmarking.

    – Jesper Juhl
    Apr 18 at 14:22


















30

















I was trying to solve a coding problem in C++ which counts the number of prime numbers less than a non-negative number n.



So I first came up with some code:



int countPrimes(int n) 
vector<bool> flag(n+1,1);
for(int i =2;i<n;i++)

if(flag[i]==1)
for(long j=i;i*j<n;j++)
flag[i*j]=0;

int result=0;
for(int i =2;i<n;i++)
result+=flag[i];
return result;



which takes 88 ms and uses 8.6 MB of memory. Then I changed my code into:



int countPrimes(int n) 
// vector<bool> flag(n+1,1);
bool flag[n+1] ;
fill(flag,flag+n+1,true);
for(int i =2;i<n;i++)

if(flag[i]==1)
for(long j=i;i*j<n;j++)
flag[i*j]=0;

int result=0;
for(int i =2;i<n;i++)
result+=flag[i];
return result;



which takes 28 ms and 9.9 MB. I don't really understand why there is such a performance gap in both the running time and memory consumption. I have read relative questions like this one and that one but I am still confused.



EDIT: I reduced the running time to 40 ms with 11.5 MB of memory after replacing vector<bool> with vector<char>.










share|improve this question























  • 6





    How are you compiling your code? What compiler options? Also; vector<bool> is special.

    – Jesper Juhl
    Apr 18 at 11:45







  • 13





    be carfull, std::vector<bool> is a weird specialisation, more a bitset than a vector

    – Martin Morterol
    Apr 18 at 11:49






  • 3





    you may gain some time changing one loop a bit: for(int i = 2; i * i <n;i++) since if i * i >= n then next loop does nothing.

    – Marek R
    Apr 18 at 12:20







  • 3





    This doesn't address the question, but when you're dealing with boolean types, use true and false and not 1. So: vector<bool> flag(n+1, true); and if (flag[i]). That doesn't affect the result, but it makes it much clearer what you're doing.

    – Pete Becker
    Apr 18 at 12:53






  • 4





    If you are not already doing so, make sure to compile your code with compiler optimizations enabled before benchmarking.

    – Jesper Juhl
    Apr 18 at 14:22














30












30








30


1






I was trying to solve a coding problem in C++ which counts the number of prime numbers less than a non-negative number n.



So I first came up with some code:



int countPrimes(int n) 
vector<bool> flag(n+1,1);
for(int i =2;i<n;i++)

if(flag[i]==1)
for(long j=i;i*j<n;j++)
flag[i*j]=0;

int result=0;
for(int i =2;i<n;i++)
result+=flag[i];
return result;



which takes 88 ms and uses 8.6 MB of memory. Then I changed my code into:



int countPrimes(int n) 
// vector<bool> flag(n+1,1);
bool flag[n+1] ;
fill(flag,flag+n+1,true);
for(int i =2;i<n;i++)

if(flag[i]==1)
for(long j=i;i*j<n;j++)
flag[i*j]=0;

int result=0;
for(int i =2;i<n;i++)
result+=flag[i];
return result;



which takes 28 ms and 9.9 MB. I don't really understand why there is such a performance gap in both the running time and memory consumption. I have read relative questions like this one and that one but I am still confused.



EDIT: I reduced the running time to 40 ms with 11.5 MB of memory after replacing vector<bool> with vector<char>.










share|improve this question
















I was trying to solve a coding problem in C++ which counts the number of prime numbers less than a non-negative number n.



So I first came up with some code:



int countPrimes(int n) 
vector<bool> flag(n+1,1);
for(int i =2;i<n;i++)

if(flag[i]==1)
for(long j=i;i*j<n;j++)
flag[i*j]=0;

int result=0;
for(int i =2;i<n;i++)
result+=flag[i];
return result;



which takes 88 ms and uses 8.6 MB of memory. Then I changed my code into:



int countPrimes(int n) 
// vector<bool> flag(n+1,1);
bool flag[n+1] ;
fill(flag,flag+n+1,true);
for(int i =2;i<n;i++)

if(flag[i]==1)
for(long j=i;i*j<n;j++)
flag[i*j]=0;

int result=0;
for(int i =2;i<n;i++)
result+=flag[i];
return result;



which takes 28 ms and 9.9 MB. I don't really understand why there is such a performance gap in both the running time and memory consumption. I have read relative questions like this one and that one but I am still confused.



EDIT: I reduced the running time to 40 ms with 11.5 MB of memory after replacing vector<bool> with vector<char>.







c++ arrays performance vector std






share|improve this question















share|improve this question













share|improve this question




share|improve this question



share|improve this question








edited Apr 18 at 22:29









isanae

2,6551 gold badge16 silver badges38 bronze badges




2,6551 gold badge16 silver badges38 bronze badges










asked Apr 18 at 11:43









Qin HeyangQin Heyang

4373 silver badges8 bronze badges




4373 silver badges8 bronze badges










  • 6





    How are you compiling your code? What compiler options? Also; vector<bool> is special.

    – Jesper Juhl
    Apr 18 at 11:45







  • 13





    be carfull, std::vector<bool> is a weird specialisation, more a bitset than a vector

    – Martin Morterol
    Apr 18 at 11:49






  • 3





    you may gain some time changing one loop a bit: for(int i = 2; i * i <n;i++) since if i * i >= n then next loop does nothing.

    – Marek R
    Apr 18 at 12:20







  • 3





    This doesn't address the question, but when you're dealing with boolean types, use true and false and not 1. So: vector<bool> flag(n+1, true); and if (flag[i]). That doesn't affect the result, but it makes it much clearer what you're doing.

    – Pete Becker
    Apr 18 at 12:53






  • 4





    If you are not already doing so, make sure to compile your code with compiler optimizations enabled before benchmarking.

    – Jesper Juhl
    Apr 18 at 14:22













  • 6





    How are you compiling your code? What compiler options? Also; vector<bool> is special.

    – Jesper Juhl
    Apr 18 at 11:45







  • 13





    be carfull, std::vector<bool> is a weird specialisation, more a bitset than a vector

    – Martin Morterol
    Apr 18 at 11:49






  • 3





    you may gain some time changing one loop a bit: for(int i = 2; i * i <n;i++) since if i * i >= n then next loop does nothing.

    – Marek R
    Apr 18 at 12:20







  • 3





    This doesn't address the question, but when you're dealing with boolean types, use true and false and not 1. So: vector<bool> flag(n+1, true); and if (flag[i]). That doesn't affect the result, but it makes it much clearer what you're doing.

    – Pete Becker
    Apr 18 at 12:53






  • 4





    If you are not already doing so, make sure to compile your code with compiler optimizations enabled before benchmarking.

    – Jesper Juhl
    Apr 18 at 14:22








6




6





How are you compiling your code? What compiler options? Also; vector<bool> is special.

– Jesper Juhl
Apr 18 at 11:45






How are you compiling your code? What compiler options? Also; vector<bool> is special.

– Jesper Juhl
Apr 18 at 11:45





13




13





be carfull, std::vector<bool> is a weird specialisation, more a bitset than a vector

– Martin Morterol
Apr 18 at 11:49





be carfull, std::vector<bool> is a weird specialisation, more a bitset than a vector

– Martin Morterol
Apr 18 at 11:49




3




3





you may gain some time changing one loop a bit: for(int i = 2; i * i <n;i++) since if i * i >= n then next loop does nothing.

– Marek R
Apr 18 at 12:20






you may gain some time changing one loop a bit: for(int i = 2; i * i <n;i++) since if i * i >= n then next loop does nothing.

– Marek R
Apr 18 at 12:20





3




3





This doesn't address the question, but when you're dealing with boolean types, use true and false and not 1. So: vector<bool> flag(n+1, true); and if (flag[i]). That doesn't affect the result, but it makes it much clearer what you're doing.

– Pete Becker
Apr 18 at 12:53





This doesn't address the question, but when you're dealing with boolean types, use true and false and not 1. So: vector<bool> flag(n+1, true); and if (flag[i]). That doesn't affect the result, but it makes it much clearer what you're doing.

– Pete Becker
Apr 18 at 12:53




4




4





If you are not already doing so, make sure to compile your code with compiler optimizations enabled before benchmarking.

– Jesper Juhl
Apr 18 at 14:22






If you are not already doing so, make sure to compile your code with compiler optimizations enabled before benchmarking.

– Jesper Juhl
Apr 18 at 14:22













4 Answers
4






active

oldest

votes


















29


















std::vector<bool> isn't like any other vector. The documentation says:




std::vector<bool> is a possibly space-efficient specialization of
std::vector for the type bool.




That's why it may use up less memory than an array, because it might represent multiple boolean values with one byte, like a bitset. It also explains the performance difference, since accessing it isn't as simple anymore. According to the documentation, it doesn't even have to store it as a contiguous array.






share|improve this answer

































    16


















    std::vector<bool> is special case. It is specialized template. Each value is stored in single bit, so bit operations are needed. This memory compact but has couple drawbacks (like no way to have a pointer to bool inside this container).



    Now bool flag[n+1]; compiler will usually allocate same memory in same manner as for char flag[n+1]; and it will do that on stack, not on heap.



    Now depending on page sizes, cache misses and i values one can be faster then other. It is hard to predict (for small n array will be faster, but for larger n result may change).



    As an interesting experiment you can change std::vector<bool> to std::vector<char>. In this case you will have similar memory mapping as in case of array, but it will be located at heap not a stack.






    share|improve this answer



































      6


















      I'd like to add some remarks to the good answers already posted.




      • The performance differences between std::vector<bool> and std::vector<char> may vary (a lot) between different library implementations and different sizes of the vectors.



        See e.g. those quick benches: clang++ / libc++(LLVM) vs. g++ / libstdc++(GNU).



      • This: bool flag[n+1]; declares a Variable Length Array, which (despites some performance advantages due to it beeing allocated in the stack) has never been part of the C++ standard, even if provided as an extension by some (C99 compliant) compilers.


      • Another way to increase the performances could be to reduce the amount of calculations (and memory occupation) by considering only the odd numbers, given that all the primes except for 2 are odd.


      If you can bare the less readable code, you could try to profile the following snippet.



      int countPrimes(int n)

      if ( n < 2 )
      return 0;
      // Sieve starting from 3 up to n, the number of odd number between 3 and n are
      int sieve_size = n / 2 - 1;
      std::vector<char> sieve(sieve_size);
      int result = 1; // 2 is a prime.

      for (int i = 0; i < sieve_size; ++i)

      if ( sieve[i] == 0 )

      // It's a prime, no need to scan the vector again
      ++result;
      // Some ugly transformations are needed, here
      int prime = i * 2 + 3;
      for ( int j = prime * 3, k = prime * 2; j <= n; j += k)
      sieve[j / 2 - 1] = 1;



      return result;



      Edit



      As Peter Cordes noted in the comments, using an unsigned type for the variable j




      the compiler can implement j/2 as cheaply as possible. C signed division by a power of 2 has different rounding semantics (for negative dividends) than a right shift, and compilers don't always propagate value-range proofs sufficiently to prove that j will always be non-negative.




      It's also possible to reduce the number of candidates exploiting the fact that all primes (past 2 and 3) are one below or above a multiple of 6.






      share|improve this answer























      • 1





        Wow, 20 years after C99, and not even the most rudimentary VLAs... Looks like C++ is getting more and more behind C ;-) Afaik, there was once a proposal that would have made int array[length]; legal in C++, but even that proposal did not allow for multidimensional arrays, neither on the stack (int array2d[height][width]) nor on the heap (int (*array2d)[width] = new int[height][width];). Apparently, that proposal was never approved, even though C goes as far as to allow the multidimensional case (int (*array2d)[width] = malloc(height * sizeof(*array2d));) since 20 years ago...

        – cmaster
        Apr 18 at 15:53











      • @cmaster Well, that's debatable ;)

        – Bob__
        Apr 18 at 16:02






      • 1





        Yeah, I know. Especially since very few people actually understand the power of the C99 syntax - both of the top-scoring answers of the question you linked completely underestimate its value. Because, you see, C99 VLAs not just exist on the stack, they can also be allocated on the heap. I have already given an example of that, it's the last one. The real power comes from the fact that any C99 array type can have a runtime length, it can be pointed to or typedefed. And C99 does not care how many runtime length array types you nest, the indexing calculation comes out right.

        – cmaster
        Apr 18 at 20:09






      • 1





        Long story short: I really enjoy working with C++ for the most part, but when it comes to manipulating multidimensional arrays of plain data, (images, volumes, simulations), I'll just fall back to C99 for its VLA syntax.

        – cmaster
        Apr 18 at 20:11






      • 1





        gcc does show a difference there, because it's actually doing worse with signed int. Overall it makes faster code, but you can see that it using sar / sub / movslq to sign-extend back to 64-bit. I suggested using size_t re-extension to pointer width wouldn't be necessary. (Although zero-extension is free on x86-64; writing a 32-bit reg zero-extends into the 64-bit reg, but 64 lets it use -1 in the addr mode.) Anyway, unsigned long does give an extra speedup with gcc better than unsigned. quick-bench.com/4t9DE9nsT-oQYSNo0leeyYZ8GvY. (long int also is the same.)

        – Peter Cordes
        Apr 19 at 9:38



















      0


















      I am getting different timings and memory usage than the ones mentioned in the question when compiling with g++-7.4.0 -g -march=native -O2 -Wall and running on a Ryzen 5 1600 CPU:




      • vector<bool>: 0.038 seconds, 3344 KiB memory, IPC 3.16


      • vector<char>: 0.048 seconds, 12004 KiB memory, IPC 1.52


      • bool[N]: 0.050 seconds, 12644 KiB memory, IPC 1.69

      Conclusion: vector<bool> is the fastest option because of its higher IPC (instructions per clock).




      #include <stdio.h>
      #include <stdlib.h>
      #include <sys/resource.h>
      #include <vector>

      size_t countPrimes(size_t n)
      std::vector<bool> flag(n+1,1);
      //std::vector<char> flag(n+1,1);
      //bool flag[n+1]; std::fill(flag,flag+n+1,true);
      for(size_t i=2;i<n;i++)
      if(flag[i]==1)
      for(size_t j=i;i*j<n;j++)
      flag[i*j]=0;



      size_t result=0;
      for(size_t i=2;i<n;i++)
      result+=flag[i];

      return result;


      int main()

      const rlim_t kStackSize = 16*1024*1024;
      struct rlimit rl;
      int result = getrlimit(RLIMIT_STACK, &rl);
      if(result != 0) abort();
      if(rl.rlim_cur < kStackSize)
      rl.rlim_cur = kStackSize;
      result = setrlimit(RLIMIT_STACK, &rl);
      if(result != 0) abort();



      printf("%zun", countPrimes(10e6));
      return 0;






      share|improve this answer



























        Your Answer






        StackExchange.ifUsing("editor", function ()
        StackExchange.using("externalEditor", function ()
        StackExchange.using("snippets", function ()
        StackExchange.snippets.init();
        );
        );
        , "code-snippets");

        StackExchange.ready(function()
        var channelOptions =
        tags: "".split(" "),
        id: "1"
        ;
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function()
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled)
        StackExchange.using("snippets", function()
        createEditor();
        );

        else
        createEditor();

        );

        function createEditor()
        StackExchange.prepareEditor(
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader:
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        ,
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        );



        );














        draft saved

        draft discarded
















        StackExchange.ready(
        function ()
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55745312%2fperformance-gap-between-vectorbool-and-array%23new-answer', 'question_page');

        );

        Post as a guest















        Required, but never shown


























        4 Answers
        4






        active

        oldest

        votes








        4 Answers
        4






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        29


















        std::vector<bool> isn't like any other vector. The documentation says:




        std::vector<bool> is a possibly space-efficient specialization of
        std::vector for the type bool.




        That's why it may use up less memory than an array, because it might represent multiple boolean values with one byte, like a bitset. It also explains the performance difference, since accessing it isn't as simple anymore. According to the documentation, it doesn't even have to store it as a contiguous array.






        share|improve this answer






























          29


















          std::vector<bool> isn't like any other vector. The documentation says:




          std::vector<bool> is a possibly space-efficient specialization of
          std::vector for the type bool.




          That's why it may use up less memory than an array, because it might represent multiple boolean values with one byte, like a bitset. It also explains the performance difference, since accessing it isn't as simple anymore. According to the documentation, it doesn't even have to store it as a contiguous array.






          share|improve this answer




























            29














            29










            29









            std::vector<bool> isn't like any other vector. The documentation says:




            std::vector<bool> is a possibly space-efficient specialization of
            std::vector for the type bool.




            That's why it may use up less memory than an array, because it might represent multiple boolean values with one byte, like a bitset. It also explains the performance difference, since accessing it isn't as simple anymore. According to the documentation, it doesn't even have to store it as a contiguous array.






            share|improve this answer














            std::vector<bool> isn't like any other vector. The documentation says:




            std::vector<bool> is a possibly space-efficient specialization of
            std::vector for the type bool.




            That's why it may use up less memory than an array, because it might represent multiple boolean values with one byte, like a bitset. It also explains the performance difference, since accessing it isn't as simple anymore. According to the documentation, it doesn't even have to store it as a contiguous array.







            share|improve this answer













            share|improve this answer




            share|improve this answer



            share|improve this answer










            answered Apr 18 at 11:52









            BlazeBlaze

            12k1 gold badge17 silver badges37 bronze badges




            12k1 gold badge17 silver badges37 bronze badges


























                16


















                std::vector<bool> is special case. It is specialized template. Each value is stored in single bit, so bit operations are needed. This memory compact but has couple drawbacks (like no way to have a pointer to bool inside this container).



                Now bool flag[n+1]; compiler will usually allocate same memory in same manner as for char flag[n+1]; and it will do that on stack, not on heap.



                Now depending on page sizes, cache misses and i values one can be faster then other. It is hard to predict (for small n array will be faster, but for larger n result may change).



                As an interesting experiment you can change std::vector<bool> to std::vector<char>. In this case you will have similar memory mapping as in case of array, but it will be located at heap not a stack.






                share|improve this answer
































                  16


















                  std::vector<bool> is special case. It is specialized template. Each value is stored in single bit, so bit operations are needed. This memory compact but has couple drawbacks (like no way to have a pointer to bool inside this container).



                  Now bool flag[n+1]; compiler will usually allocate same memory in same manner as for char flag[n+1]; and it will do that on stack, not on heap.



                  Now depending on page sizes, cache misses and i values one can be faster then other. It is hard to predict (for small n array will be faster, but for larger n result may change).



                  As an interesting experiment you can change std::vector<bool> to std::vector<char>. In this case you will have similar memory mapping as in case of array, but it will be located at heap not a stack.






                  share|improve this answer






























                    16














                    16










                    16









                    std::vector<bool> is special case. It is specialized template. Each value is stored in single bit, so bit operations are needed. This memory compact but has couple drawbacks (like no way to have a pointer to bool inside this container).



                    Now bool flag[n+1]; compiler will usually allocate same memory in same manner as for char flag[n+1]; and it will do that on stack, not on heap.



                    Now depending on page sizes, cache misses and i values one can be faster then other. It is hard to predict (for small n array will be faster, but for larger n result may change).



                    As an interesting experiment you can change std::vector<bool> to std::vector<char>. In this case you will have similar memory mapping as in case of array, but it will be located at heap not a stack.






                    share|improve this answer
















                    std::vector<bool> is special case. It is specialized template. Each value is stored in single bit, so bit operations are needed. This memory compact but has couple drawbacks (like no way to have a pointer to bool inside this container).



                    Now bool flag[n+1]; compiler will usually allocate same memory in same manner as for char flag[n+1]; and it will do that on stack, not on heap.



                    Now depending on page sizes, cache misses and i values one can be faster then other. It is hard to predict (for small n array will be faster, but for larger n result may change).



                    As an interesting experiment you can change std::vector<bool> to std::vector<char>. In this case you will have similar memory mapping as in case of array, but it will be located at heap not a stack.







                    share|improve this answer















                    share|improve this answer




                    share|improve this answer



                    share|improve this answer








                    edited Apr 18 at 12:15









                    Jesper Juhl

                    1




                    1










                    answered Apr 18 at 11:53









                    Marek RMarek R

                    15.7k3 gold badges30 silver badges81 bronze badges




                    15.7k3 gold badges30 silver badges81 bronze badges
























                        6


















                        I'd like to add some remarks to the good answers already posted.




                        • The performance differences between std::vector<bool> and std::vector<char> may vary (a lot) between different library implementations and different sizes of the vectors.



                          See e.g. those quick benches: clang++ / libc++(LLVM) vs. g++ / libstdc++(GNU).



                        • This: bool flag[n+1]; declares a Variable Length Array, which (despites some performance advantages due to it beeing allocated in the stack) has never been part of the C++ standard, even if provided as an extension by some (C99 compliant) compilers.


                        • Another way to increase the performances could be to reduce the amount of calculations (and memory occupation) by considering only the odd numbers, given that all the primes except for 2 are odd.


                        If you can bare the less readable code, you could try to profile the following snippet.



                        int countPrimes(int n)

                        if ( n < 2 )
                        return 0;
                        // Sieve starting from 3 up to n, the number of odd number between 3 and n are
                        int sieve_size = n / 2 - 1;
                        std::vector<char> sieve(sieve_size);
                        int result = 1; // 2 is a prime.

                        for (int i = 0; i < sieve_size; ++i)

                        if ( sieve[i] == 0 )

                        // It's a prime, no need to scan the vector again
                        ++result;
                        // Some ugly transformations are needed, here
                        int prime = i * 2 + 3;
                        for ( int j = prime * 3, k = prime * 2; j <= n; j += k)
                        sieve[j / 2 - 1] = 1;



                        return result;



                        Edit



                        As Peter Cordes noted in the comments, using an unsigned type for the variable j




                        the compiler can implement j/2 as cheaply as possible. C signed division by a power of 2 has different rounding semantics (for negative dividends) than a right shift, and compilers don't always propagate value-range proofs sufficiently to prove that j will always be non-negative.




                        It's also possible to reduce the number of candidates exploiting the fact that all primes (past 2 and 3) are one below or above a multiple of 6.






                        share|improve this answer























                        • 1





                          Wow, 20 years after C99, and not even the most rudimentary VLAs... Looks like C++ is getting more and more behind C ;-) Afaik, there was once a proposal that would have made int array[length]; legal in C++, but even that proposal did not allow for multidimensional arrays, neither on the stack (int array2d[height][width]) nor on the heap (int (*array2d)[width] = new int[height][width];). Apparently, that proposal was never approved, even though C goes as far as to allow the multidimensional case (int (*array2d)[width] = malloc(height * sizeof(*array2d));) since 20 years ago...

                          – cmaster
                          Apr 18 at 15:53











                        • @cmaster Well, that's debatable ;)

                          – Bob__
                          Apr 18 at 16:02






                        • 1





                          Yeah, I know. Especially since very few people actually understand the power of the C99 syntax - both of the top-scoring answers of the question you linked completely underestimate its value. Because, you see, C99 VLAs not just exist on the stack, they can also be allocated on the heap. I have already given an example of that, it's the last one. The real power comes from the fact that any C99 array type can have a runtime length, it can be pointed to or typedefed. And C99 does not care how many runtime length array types you nest, the indexing calculation comes out right.

                          – cmaster
                          Apr 18 at 20:09






                        • 1





                          Long story short: I really enjoy working with C++ for the most part, but when it comes to manipulating multidimensional arrays of plain data, (images, volumes, simulations), I'll just fall back to C99 for its VLA syntax.

                          – cmaster
                          Apr 18 at 20:11






                        • 1





                          gcc does show a difference there, because it's actually doing worse with signed int. Overall it makes faster code, but you can see that it using sar / sub / movslq to sign-extend back to 64-bit. I suggested using size_t re-extension to pointer width wouldn't be necessary. (Although zero-extension is free on x86-64; writing a 32-bit reg zero-extends into the 64-bit reg, but 64 lets it use -1 in the addr mode.) Anyway, unsigned long does give an extra speedup with gcc better than unsigned. quick-bench.com/4t9DE9nsT-oQYSNo0leeyYZ8GvY. (long int also is the same.)

                          – Peter Cordes
                          Apr 19 at 9:38
















                        6


















                        I'd like to add some remarks to the good answers already posted.




                        • The performance differences between std::vector<bool> and std::vector<char> may vary (a lot) between different library implementations and different sizes of the vectors.



                          See e.g. those quick benches: clang++ / libc++(LLVM) vs. g++ / libstdc++(GNU).



                        • This: bool flag[n+1]; declares a Variable Length Array, which (despites some performance advantages due to it beeing allocated in the stack) has never been part of the C++ standard, even if provided as an extension by some (C99 compliant) compilers.


                        • Another way to increase the performances could be to reduce the amount of calculations (and memory occupation) by considering only the odd numbers, given that all the primes except for 2 are odd.


                        If you can bare the less readable code, you could try to profile the following snippet.



                        int countPrimes(int n)

                        if ( n < 2 )
                        return 0;
                        // Sieve starting from 3 up to n, the number of odd number between 3 and n are
                        int sieve_size = n / 2 - 1;
                        std::vector<char> sieve(sieve_size);
                        int result = 1; // 2 is a prime.

                        for (int i = 0; i < sieve_size; ++i)

                        if ( sieve[i] == 0 )

                        // It's a prime, no need to scan the vector again
                        ++result;
                        // Some ugly transformations are needed, here
                        int prime = i * 2 + 3;
                        for ( int j = prime * 3, k = prime * 2; j <= n; j += k)
                        sieve[j / 2 - 1] = 1;



                        return result;



                        Edit



                        As Peter Cordes noted in the comments, using an unsigned type for the variable j




                        the compiler can implement j/2 as cheaply as possible. C signed division by a power of 2 has different rounding semantics (for negative dividends) than a right shift, and compilers don't always propagate value-range proofs sufficiently to prove that j will always be non-negative.




                        It's also possible to reduce the number of candidates exploiting the fact that all primes (past 2 and 3) are one below or above a multiple of 6.






                        share|improve this answer























                        • 1





                          Wow, 20 years after C99, and not even the most rudimentary VLAs... Looks like C++ is getting more and more behind C ;-) Afaik, there was once a proposal that would have made int array[length]; legal in C++, but even that proposal did not allow for multidimensional arrays, neither on the stack (int array2d[height][width]) nor on the heap (int (*array2d)[width] = new int[height][width];). Apparently, that proposal was never approved, even though C goes as far as to allow the multidimensional case (int (*array2d)[width] = malloc(height * sizeof(*array2d));) since 20 years ago...

                          – cmaster
                          Apr 18 at 15:53











                        • @cmaster Well, that's debatable ;)

                          – Bob__
                          Apr 18 at 16:02






                        • 1





                          Yeah, I know. Especially since very few people actually understand the power of the C99 syntax - both of the top-scoring answers of the question you linked completely underestimate its value. Because, you see, C99 VLAs not just exist on the stack, they can also be allocated on the heap. I have already given an example of that, it's the last one. The real power comes from the fact that any C99 array type can have a runtime length, it can be pointed to or typedefed. And C99 does not care how many runtime length array types you nest, the indexing calculation comes out right.

                          – cmaster
                          Apr 18 at 20:09






                        • 1





                          Long story short: I really enjoy working with C++ for the most part, but when it comes to manipulating multidimensional arrays of plain data, (images, volumes, simulations), I'll just fall back to C99 for its VLA syntax.

                          – cmaster
                          Apr 18 at 20:11






                        • 1





                          gcc does show a difference there, because it's actually doing worse with signed int. Overall it makes faster code, but you can see that it using sar / sub / movslq to sign-extend back to 64-bit. I suggested using size_t re-extension to pointer width wouldn't be necessary. (Although zero-extension is free on x86-64; writing a 32-bit reg zero-extends into the 64-bit reg, but 64 lets it use -1 in the addr mode.) Anyway, unsigned long does give an extra speedup with gcc better than unsigned. quick-bench.com/4t9DE9nsT-oQYSNo0leeyYZ8GvY. (long int also is the same.)

                          – Peter Cordes
                          Apr 19 at 9:38














                        6














                        6










                        6









                        I'd like to add some remarks to the good answers already posted.




                        • The performance differences between std::vector<bool> and std::vector<char> may vary (a lot) between different library implementations and different sizes of the vectors.



                          See e.g. those quick benches: clang++ / libc++(LLVM) vs. g++ / libstdc++(GNU).



                        • This: bool flag[n+1]; declares a Variable Length Array, which (despites some performance advantages due to it beeing allocated in the stack) has never been part of the C++ standard, even if provided as an extension by some (C99 compliant) compilers.


                        • Another way to increase the performances could be to reduce the amount of calculations (and memory occupation) by considering only the odd numbers, given that all the primes except for 2 are odd.


                        If you can bare the less readable code, you could try to profile the following snippet.



                        int countPrimes(int n)

                        if ( n < 2 )
                        return 0;
                        // Sieve starting from 3 up to n, the number of odd number between 3 and n are
                        int sieve_size = n / 2 - 1;
                        std::vector<char> sieve(sieve_size);
                        int result = 1; // 2 is a prime.

                        for (int i = 0; i < sieve_size; ++i)

                        if ( sieve[i] == 0 )

                        // It's a prime, no need to scan the vector again
                        ++result;
                        // Some ugly transformations are needed, here
                        int prime = i * 2 + 3;
                        for ( int j = prime * 3, k = prime * 2; j <= n; j += k)
                        sieve[j / 2 - 1] = 1;



                        return result;



                        Edit



                        As Peter Cordes noted in the comments, using an unsigned type for the variable j




                        the compiler can implement j/2 as cheaply as possible. C signed division by a power of 2 has different rounding semantics (for negative dividends) than a right shift, and compilers don't always propagate value-range proofs sufficiently to prove that j will always be non-negative.




                        It's also possible to reduce the number of candidates exploiting the fact that all primes (past 2 and 3) are one below or above a multiple of 6.






                        share|improve this answer
















                        I'd like to add some remarks to the good answers already posted.




                        • The performance differences between std::vector<bool> and std::vector<char> may vary (a lot) between different library implementations and different sizes of the vectors.



                          See e.g. those quick benches: clang++ / libc++(LLVM) vs. g++ / libstdc++(GNU).



                        • This: bool flag[n+1]; declares a Variable Length Array, which (despites some performance advantages due to it beeing allocated in the stack) has never been part of the C++ standard, even if provided as an extension by some (C99 compliant) compilers.


                        • Another way to increase the performances could be to reduce the amount of calculations (and memory occupation) by considering only the odd numbers, given that all the primes except for 2 are odd.


                        If you can bare the less readable code, you could try to profile the following snippet.



                        int countPrimes(int n)

                        if ( n < 2 )
                        return 0;
                        // Sieve starting from 3 up to n, the number of odd number between 3 and n are
                        int sieve_size = n / 2 - 1;
                        std::vector<char> sieve(sieve_size);
                        int result = 1; // 2 is a prime.

                        for (int i = 0; i < sieve_size; ++i)

                        if ( sieve[i] == 0 )

                        // It's a prime, no need to scan the vector again
                        ++result;
                        // Some ugly transformations are needed, here
                        int prime = i * 2 + 3;
                        for ( int j = prime * 3, k = prime * 2; j <= n; j += k)
                        sieve[j / 2 - 1] = 1;



                        return result;



                        Edit



                        As Peter Cordes noted in the comments, using an unsigned type for the variable j




                        the compiler can implement j/2 as cheaply as possible. C signed division by a power of 2 has different rounding semantics (for negative dividends) than a right shift, and compilers don't always propagate value-range proofs sufficiently to prove that j will always be non-negative.




                        It's also possible to reduce the number of candidates exploiting the fact that all primes (past 2 and 3) are one below or above a multiple of 6.







                        share|improve this answer















                        share|improve this answer




                        share|improve this answer



                        share|improve this answer








                        edited Apr 19 at 9:37

























                        answered Apr 18 at 14:59









                        Bob__Bob__

                        6,1233 gold badges17 silver badges28 bronze badges




                        6,1233 gold badges17 silver badges28 bronze badges










                        • 1





                          Wow, 20 years after C99, and not even the most rudimentary VLAs... Looks like C++ is getting more and more behind C ;-) Afaik, there was once a proposal that would have made int array[length]; legal in C++, but even that proposal did not allow for multidimensional arrays, neither on the stack (int array2d[height][width]) nor on the heap (int (*array2d)[width] = new int[height][width];). Apparently, that proposal was never approved, even though C goes as far as to allow the multidimensional case (int (*array2d)[width] = malloc(height * sizeof(*array2d));) since 20 years ago...

                          – cmaster
                          Apr 18 at 15:53











                        • @cmaster Well, that's debatable ;)

                          – Bob__
                          Apr 18 at 16:02






                        • 1





                          Yeah, I know. Especially since very few people actually understand the power of the C99 syntax - both of the top-scoring answers of the question you linked completely underestimate its value. Because, you see, C99 VLAs not just exist on the stack, they can also be allocated on the heap. I have already given an example of that, it's the last one. The real power comes from the fact that any C99 array type can have a runtime length, it can be pointed to or typedefed. And C99 does not care how many runtime length array types you nest, the indexing calculation comes out right.

                          – cmaster
                          Apr 18 at 20:09






                        • 1





                          Long story short: I really enjoy working with C++ for the most part, but when it comes to manipulating multidimensional arrays of plain data, (images, volumes, simulations), I'll just fall back to C99 for its VLA syntax.

                          – cmaster
                          Apr 18 at 20:11






                        • 1





                          gcc does show a difference there, because it's actually doing worse with signed int. Overall it makes faster code, but you can see that it using sar / sub / movslq to sign-extend back to 64-bit. I suggested using size_t re-extension to pointer width wouldn't be necessary. (Although zero-extension is free on x86-64; writing a 32-bit reg zero-extends into the 64-bit reg, but 64 lets it use -1 in the addr mode.) Anyway, unsigned long does give an extra speedup with gcc better than unsigned. quick-bench.com/4t9DE9nsT-oQYSNo0leeyYZ8GvY. (long int also is the same.)

                          – Peter Cordes
                          Apr 19 at 9:38













                        • 1





                          Wow, 20 years after C99, and not even the most rudimentary VLAs... Looks like C++ is getting more and more behind C ;-) Afaik, there was once a proposal that would have made int array[length]; legal in C++, but even that proposal did not allow for multidimensional arrays, neither on the stack (int array2d[height][width]) nor on the heap (int (*array2d)[width] = new int[height][width];). Apparently, that proposal was never approved, even though C goes as far as to allow the multidimensional case (int (*array2d)[width] = malloc(height * sizeof(*array2d));) since 20 years ago...

                          – cmaster
                          Apr 18 at 15:53











                        • @cmaster Well, that's debatable ;)

                          – Bob__
                          Apr 18 at 16:02






                        • 1





                          Yeah, I know. Especially since very few people actually understand the power of the C99 syntax - both of the top-scoring answers of the question you linked completely underestimate its value. Because, you see, C99 VLAs not just exist on the stack, they can also be allocated on the heap. I have already given an example of that, it's the last one. The real power comes from the fact that any C99 array type can have a runtime length, it can be pointed to or typedefed. And C99 does not care how many runtime length array types you nest, the indexing calculation comes out right.

                          – cmaster
                          Apr 18 at 20:09






                        • 1





                          Long story short: I really enjoy working with C++ for the most part, but when it comes to manipulating multidimensional arrays of plain data, (images, volumes, simulations), I'll just fall back to C99 for its VLA syntax.

                          – cmaster
                          Apr 18 at 20:11






                        • 1





                          gcc does show a difference there, because it's actually doing worse with signed int. Overall it makes faster code, but you can see that it using sar / sub / movslq to sign-extend back to 64-bit. I suggested using size_t re-extension to pointer width wouldn't be necessary. (Although zero-extension is free on x86-64; writing a 32-bit reg zero-extends into the 64-bit reg, but 64 lets it use -1 in the addr mode.) Anyway, unsigned long does give an extra speedup with gcc better than unsigned. quick-bench.com/4t9DE9nsT-oQYSNo0leeyYZ8GvY. (long int also is the same.)

                          – Peter Cordes
                          Apr 19 at 9:38








                        1




                        1





                        Wow, 20 years after C99, and not even the most rudimentary VLAs... Looks like C++ is getting more and more behind C ;-) Afaik, there was once a proposal that would have made int array[length]; legal in C++, but even that proposal did not allow for multidimensional arrays, neither on the stack (int array2d[height][width]) nor on the heap (int (*array2d)[width] = new int[height][width];). Apparently, that proposal was never approved, even though C goes as far as to allow the multidimensional case (int (*array2d)[width] = malloc(height * sizeof(*array2d));) since 20 years ago...

                        – cmaster
                        Apr 18 at 15:53





                        Wow, 20 years after C99, and not even the most rudimentary VLAs... Looks like C++ is getting more and more behind C ;-) Afaik, there was once a proposal that would have made int array[length]; legal in C++, but even that proposal did not allow for multidimensional arrays, neither on the stack (int array2d[height][width]) nor on the heap (int (*array2d)[width] = new int[height][width];). Apparently, that proposal was never approved, even though C goes as far as to allow the multidimensional case (int (*array2d)[width] = malloc(height * sizeof(*array2d));) since 20 years ago...

                        – cmaster
                        Apr 18 at 15:53













                        @cmaster Well, that's debatable ;)

                        – Bob__
                        Apr 18 at 16:02





                        @cmaster Well, that's debatable ;)

                        – Bob__
                        Apr 18 at 16:02




                        1




                        1





                        Yeah, I know. Especially since very few people actually understand the power of the C99 syntax - both of the top-scoring answers of the question you linked completely underestimate its value. Because, you see, C99 VLAs not just exist on the stack, they can also be allocated on the heap. I have already given an example of that, it's the last one. The real power comes from the fact that any C99 array type can have a runtime length, it can be pointed to or typedefed. And C99 does not care how many runtime length array types you nest, the indexing calculation comes out right.

                        – cmaster
                        Apr 18 at 20:09





                        Yeah, I know. Especially since very few people actually understand the power of the C99 syntax - both of the top-scoring answers of the question you linked completely underestimate its value. Because, you see, C99 VLAs not just exist on the stack, they can also be allocated on the heap. I have already given an example of that, it's the last one. The real power comes from the fact that any C99 array type can have a runtime length, it can be pointed to or typedefed. And C99 does not care how many runtime length array types you nest, the indexing calculation comes out right.

                        – cmaster
                        Apr 18 at 20:09




                        1




                        1





                        Long story short: I really enjoy working with C++ for the most part, but when it comes to manipulating multidimensional arrays of plain data, (images, volumes, simulations), I'll just fall back to C99 for its VLA syntax.

                        – cmaster
                        Apr 18 at 20:11





                        Long story short: I really enjoy working with C++ for the most part, but when it comes to manipulating multidimensional arrays of plain data, (images, volumes, simulations), I'll just fall back to C99 for its VLA syntax.

                        – cmaster
                        Apr 18 at 20:11




                        1




                        1





                        gcc does show a difference there, because it's actually doing worse with signed int. Overall it makes faster code, but you can see that it using sar / sub / movslq to sign-extend back to 64-bit. I suggested using size_t re-extension to pointer width wouldn't be necessary. (Although zero-extension is free on x86-64; writing a 32-bit reg zero-extends into the 64-bit reg, but 64 lets it use -1 in the addr mode.) Anyway, unsigned long does give an extra speedup with gcc better than unsigned. quick-bench.com/4t9DE9nsT-oQYSNo0leeyYZ8GvY. (long int also is the same.)

                        – Peter Cordes
                        Apr 19 at 9:38






                        gcc does show a difference there, because it's actually doing worse with signed int. Overall it makes faster code, but you can see that it using sar / sub / movslq to sign-extend back to 64-bit. I suggested using size_t re-extension to pointer width wouldn't be necessary. (Although zero-extension is free on x86-64; writing a 32-bit reg zero-extends into the 64-bit reg, but 64 lets it use -1 in the addr mode.) Anyway, unsigned long does give an extra speedup with gcc better than unsigned. quick-bench.com/4t9DE9nsT-oQYSNo0leeyYZ8GvY. (long int also is the same.)

                        – Peter Cordes
                        Apr 19 at 9:38












                        0


















                        I am getting different timings and memory usage than the ones mentioned in the question when compiling with g++-7.4.0 -g -march=native -O2 -Wall and running on a Ryzen 5 1600 CPU:




                        • vector<bool>: 0.038 seconds, 3344 KiB memory, IPC 3.16


                        • vector<char>: 0.048 seconds, 12004 KiB memory, IPC 1.52


                        • bool[N]: 0.050 seconds, 12644 KiB memory, IPC 1.69

                        Conclusion: vector<bool> is the fastest option because of its higher IPC (instructions per clock).




                        #include <stdio.h>
                        #include <stdlib.h>
                        #include <sys/resource.h>
                        #include <vector>

                        size_t countPrimes(size_t n)
                        std::vector<bool> flag(n+1,1);
                        //std::vector<char> flag(n+1,1);
                        //bool flag[n+1]; std::fill(flag,flag+n+1,true);
                        for(size_t i=2;i<n;i++)
                        if(flag[i]==1)
                        for(size_t j=i;i*j<n;j++)
                        flag[i*j]=0;



                        size_t result=0;
                        for(size_t i=2;i<n;i++)
                        result+=flag[i];

                        return result;


                        int main()

                        const rlim_t kStackSize = 16*1024*1024;
                        struct rlimit rl;
                        int result = getrlimit(RLIMIT_STACK, &rl);
                        if(result != 0) abort();
                        if(rl.rlim_cur < kStackSize)
                        rl.rlim_cur = kStackSize;
                        result = setrlimit(RLIMIT_STACK, &rl);
                        if(result != 0) abort();



                        printf("%zun", countPrimes(10e6));
                        return 0;






                        share|improve this answer






























                          0


















                          I am getting different timings and memory usage than the ones mentioned in the question when compiling with g++-7.4.0 -g -march=native -O2 -Wall and running on a Ryzen 5 1600 CPU:




                          • vector<bool>: 0.038 seconds, 3344 KiB memory, IPC 3.16


                          • vector<char>: 0.048 seconds, 12004 KiB memory, IPC 1.52


                          • bool[N]: 0.050 seconds, 12644 KiB memory, IPC 1.69

                          Conclusion: vector<bool> is the fastest option because of its higher IPC (instructions per clock).




                          #include <stdio.h>
                          #include <stdlib.h>
                          #include <sys/resource.h>
                          #include <vector>

                          size_t countPrimes(size_t n)
                          std::vector<bool> flag(n+1,1);
                          //std::vector<char> flag(n+1,1);
                          //bool flag[n+1]; std::fill(flag,flag+n+1,true);
                          for(size_t i=2;i<n;i++)
                          if(flag[i]==1)
                          for(size_t j=i;i*j<n;j++)
                          flag[i*j]=0;



                          size_t result=0;
                          for(size_t i=2;i<n;i++)
                          result+=flag[i];

                          return result;


                          int main()

                          const rlim_t kStackSize = 16*1024*1024;
                          struct rlimit rl;
                          int result = getrlimit(RLIMIT_STACK, &rl);
                          if(result != 0) abort();
                          if(rl.rlim_cur < kStackSize)
                          rl.rlim_cur = kStackSize;
                          result = setrlimit(RLIMIT_STACK, &rl);
                          if(result != 0) abort();



                          printf("%zun", countPrimes(10e6));
                          return 0;






                          share|improve this answer




























                            0














                            0










                            0









                            I am getting different timings and memory usage than the ones mentioned in the question when compiling with g++-7.4.0 -g -march=native -O2 -Wall and running on a Ryzen 5 1600 CPU:




                            • vector<bool>: 0.038 seconds, 3344 KiB memory, IPC 3.16


                            • vector<char>: 0.048 seconds, 12004 KiB memory, IPC 1.52


                            • bool[N]: 0.050 seconds, 12644 KiB memory, IPC 1.69

                            Conclusion: vector<bool> is the fastest option because of its higher IPC (instructions per clock).




                            #include <stdio.h>
                            #include <stdlib.h>
                            #include <sys/resource.h>
                            #include <vector>

                            size_t countPrimes(size_t n)
                            std::vector<bool> flag(n+1,1);
                            //std::vector<char> flag(n+1,1);
                            //bool flag[n+1]; std::fill(flag,flag+n+1,true);
                            for(size_t i=2;i<n;i++)
                            if(flag[i]==1)
                            for(size_t j=i;i*j<n;j++)
                            flag[i*j]=0;



                            size_t result=0;
                            for(size_t i=2;i<n;i++)
                            result+=flag[i];

                            return result;


                            int main()

                            const rlim_t kStackSize = 16*1024*1024;
                            struct rlimit rl;
                            int result = getrlimit(RLIMIT_STACK, &rl);
                            if(result != 0) abort();
                            if(rl.rlim_cur < kStackSize)
                            rl.rlim_cur = kStackSize;
                            result = setrlimit(RLIMIT_STACK, &rl);
                            if(result != 0) abort();



                            printf("%zun", countPrimes(10e6));
                            return 0;






                            share|improve this answer














                            I am getting different timings and memory usage than the ones mentioned in the question when compiling with g++-7.4.0 -g -march=native -O2 -Wall and running on a Ryzen 5 1600 CPU:




                            • vector<bool>: 0.038 seconds, 3344 KiB memory, IPC 3.16


                            • vector<char>: 0.048 seconds, 12004 KiB memory, IPC 1.52


                            • bool[N]: 0.050 seconds, 12644 KiB memory, IPC 1.69

                            Conclusion: vector<bool> is the fastest option because of its higher IPC (instructions per clock).




                            #include <stdio.h>
                            #include <stdlib.h>
                            #include <sys/resource.h>
                            #include <vector>

                            size_t countPrimes(size_t n)
                            std::vector<bool> flag(n+1,1);
                            //std::vector<char> flag(n+1,1);
                            //bool flag[n+1]; std::fill(flag,flag+n+1,true);
                            for(size_t i=2;i<n;i++)
                            if(flag[i]==1)
                            for(size_t j=i;i*j<n;j++)
                            flag[i*j]=0;



                            size_t result=0;
                            for(size_t i=2;i<n;i++)
                            result+=flag[i];

                            return result;


                            int main()

                            const rlim_t kStackSize = 16*1024*1024;
                            struct rlimit rl;
                            int result = getrlimit(RLIMIT_STACK, &rl);
                            if(result != 0) abort();
                            if(rl.rlim_cur < kStackSize)
                            rl.rlim_cur = kStackSize;
                            result = setrlimit(RLIMIT_STACK, &rl);
                            if(result != 0) abort();



                            printf("%zun", countPrimes(10e6));
                            return 0;







                            share|improve this answer













                            share|improve this answer




                            share|improve this answer



                            share|improve this answer










                            answered Apr 19 at 13:02









                            atomsymbolatomsymbol

                            2165 silver badges10 bronze badges




                            2165 silver badges10 bronze badges































                                draft saved

                                draft discarded















































                                Thanks for contributing an answer to Stack Overflow!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid


                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.

                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function ()
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55745312%2fperformance-gap-between-vectorbool-and-array%23new-answer', 'question_page');

                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown









                                Popular posts from this blog

                                Tamil (spriik) Luke uk diar | Nawigatjuun

                                Align equal signs while including text over equalitiesAMS align: left aligned text/math plus multicolumn alignmentMultiple alignmentsAligning equations in multiple placesNumbering and aligning an equation with multiple columnsHow to align one equation with another multline equationUsing \ in environments inside the begintabularxNumber equations and preserving alignment of equal signsHow can I align equations to the left and to the right?Double equation alignment problem within align enviromentAligned within align: Why are they right-aligned?

                                Training a classifier when some of the features are unknownWhy does Gradient Boosting regression predict negative values when there are no negative y-values in my training set?How to improve an existing (trained) classifier?What is effect when I set up some self defined predisctor variables?Why Matlab neural network classification returns decimal values on prediction dataset?Fitting and transforming text data in training, testing, and validation setsHow to quantify the performance of the classifier (multi-class SVM) using the test data?How do I control for some patients providing multiple samples in my training data?Training and Test setTraining a convolutional neural network for image denoising in MatlabShouldn't an autoencoder with #(neurons in hidden layer) = #(neurons in input layer) be “perfect”?