Performance gap between vector and arrayIs std::vector so much slower than plain arrays?Why aren't variable-length arrays part of the C++ standard?Using arrays or std::vectors in C++, what's the performance gap?Create ArrayList from arrayHow do I check if an array includes an object in JavaScript?How to append something to an array?Improve INSERT-per-second performance of SQLite?What is the difference between call and apply?Loop through an array in JavaScriptHow do I remove a particular element from an array in JavaScript?For-each over an array in JavaScript?Why is processing a sorted array faster than processing an unsorted array?Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviations
What is joint estimation?
What are the consequences for downstream actors of redistributing a work under a wider CC license than the copyright holder authorized?
Can the bass be used instead of drums?
Is self-defense mutually exclusive of murder?
Why are seats at the rear of a plane sometimes unavailable even though many other seats are available in the plane?
What can I do to avoid potential charges for bribery?
Type and strength of a typical chain
Is consistent disregard for students' time "normal" in undergraduate research?
An employee has low self-confidence, and is performing poorly. How can I help?
Can I use I2C over 2m cables?
Dedicated solver for convex problems
Match the blocks
Suspicious crontab entry running 'xribfa4' every 15 minutes
Why is matter-antimatter asymmetry surprising, if asymmetry can be generated by a random walk in which particles go into black holes?
A demigod among men
Why did the range based for loop specification change in C++17
How do you translate "Don't Fear the Reaper" into Latin?
Can you decide not to sneak into a room after seeing your roll?
What good is Divine Sense?
Why is CMYK & PNG not possible?
Does Windows 10 Fast Startup feature drain battery while laptop is turned off?
My name was added to manuscript as co-author without my consent; how to get it removed?
Would Anti-Magic Zone Affect Dragon Breath?
What is a terminal plane in high frequency circuits?
Performance gap between vector and array
Is std::vector so much slower than plain arrays?Why aren't variable-length arrays part of the C++ standard?Using arrays or std::vectors in C++, what's the performance gap?Create ArrayList from arrayHow do I check if an array includes an object in JavaScript?How to append something to an array?Improve INSERT-per-second performance of SQLite?What is the difference between call and apply?Loop through an array in JavaScriptHow do I remove a particular element from an array in JavaScript?For-each over an array in JavaScript?Why is processing a sorted array faster than processing an unsorted array?Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviations
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;
I was trying to solve a coding problem in C++ which counts the number of prime numbers less than a non-negative number n
.
So I first came up with some code:
int countPrimes(int n)
vector<bool> flag(n+1,1);
for(int i =2;i<n;i++)
if(flag[i]==1)
for(long j=i;i*j<n;j++)
flag[i*j]=0;
int result=0;
for(int i =2;i<n;i++)
result+=flag[i];
return result;
which takes 88 ms and uses 8.6 MB of memory. Then I changed my code into:
int countPrimes(int n)
// vector<bool> flag(n+1,1);
bool flag[n+1] ;
fill(flag,flag+n+1,true);
for(int i =2;i<n;i++)
if(flag[i]==1)
for(long j=i;i*j<n;j++)
flag[i*j]=0;
int result=0;
for(int i =2;i<n;i++)
result+=flag[i];
return result;
which takes 28 ms and 9.9 MB. I don't really understand why there is such a performance gap in both the running time and memory consumption. I have read relative questions like this one and that one but I am still confused.
EDIT: I reduced the running time to 40 ms with 11.5 MB of memory after replacing vector<bool>
with vector<char>
.
c++ arrays performance vector std
|
show 2 more comments
I was trying to solve a coding problem in C++ which counts the number of prime numbers less than a non-negative number n
.
So I first came up with some code:
int countPrimes(int n)
vector<bool> flag(n+1,1);
for(int i =2;i<n;i++)
if(flag[i]==1)
for(long j=i;i*j<n;j++)
flag[i*j]=0;
int result=0;
for(int i =2;i<n;i++)
result+=flag[i];
return result;
which takes 88 ms and uses 8.6 MB of memory. Then I changed my code into:
int countPrimes(int n)
// vector<bool> flag(n+1,1);
bool flag[n+1] ;
fill(flag,flag+n+1,true);
for(int i =2;i<n;i++)
if(flag[i]==1)
for(long j=i;i*j<n;j++)
flag[i*j]=0;
int result=0;
for(int i =2;i<n;i++)
result+=flag[i];
return result;
which takes 28 ms and 9.9 MB. I don't really understand why there is such a performance gap in both the running time and memory consumption. I have read relative questions like this one and that one but I am still confused.
EDIT: I reduced the running time to 40 ms with 11.5 MB of memory after replacing vector<bool>
with vector<char>
.
c++ arrays performance vector std
6
How are you compiling your code? What compiler options? Also;vector<bool>
is special.
– Jesper Juhl
Apr 18 at 11:45
13
be carfull, std::vector<bool> is a weird specialisation, more a bitset than a vector
– Martin Morterol
Apr 18 at 11:49
3
you may gain some time changing one loop a bit:for(int i = 2; i * i <n;i++)
since ifi * i >= n
then next loop does nothing.
– Marek R
Apr 18 at 12:20
3
This doesn't address the question, but when you're dealing with boolean types, usetrue
andfalse
and not1
. So:vector<bool> flag(n+1, true);
andif (flag[i])
. That doesn't affect the result, but it makes it much clearer what you're doing.
– Pete Becker
Apr 18 at 12:53
4
If you are not already doing so, make sure to compile your code with compiler optimizations enabled before benchmarking.
– Jesper Juhl
Apr 18 at 14:22
|
show 2 more comments
I was trying to solve a coding problem in C++ which counts the number of prime numbers less than a non-negative number n
.
So I first came up with some code:
int countPrimes(int n)
vector<bool> flag(n+1,1);
for(int i =2;i<n;i++)
if(flag[i]==1)
for(long j=i;i*j<n;j++)
flag[i*j]=0;
int result=0;
for(int i =2;i<n;i++)
result+=flag[i];
return result;
which takes 88 ms and uses 8.6 MB of memory. Then I changed my code into:
int countPrimes(int n)
// vector<bool> flag(n+1,1);
bool flag[n+1] ;
fill(flag,flag+n+1,true);
for(int i =2;i<n;i++)
if(flag[i]==1)
for(long j=i;i*j<n;j++)
flag[i*j]=0;
int result=0;
for(int i =2;i<n;i++)
result+=flag[i];
return result;
which takes 28 ms and 9.9 MB. I don't really understand why there is such a performance gap in both the running time and memory consumption. I have read relative questions like this one and that one but I am still confused.
EDIT: I reduced the running time to 40 ms with 11.5 MB of memory after replacing vector<bool>
with vector<char>
.
c++ arrays performance vector std
I was trying to solve a coding problem in C++ which counts the number of prime numbers less than a non-negative number n
.
So I first came up with some code:
int countPrimes(int n)
vector<bool> flag(n+1,1);
for(int i =2;i<n;i++)
if(flag[i]==1)
for(long j=i;i*j<n;j++)
flag[i*j]=0;
int result=0;
for(int i =2;i<n;i++)
result+=flag[i];
return result;
which takes 88 ms and uses 8.6 MB of memory. Then I changed my code into:
int countPrimes(int n)
// vector<bool> flag(n+1,1);
bool flag[n+1] ;
fill(flag,flag+n+1,true);
for(int i =2;i<n;i++)
if(flag[i]==1)
for(long j=i;i*j<n;j++)
flag[i*j]=0;
int result=0;
for(int i =2;i<n;i++)
result+=flag[i];
return result;
which takes 28 ms and 9.9 MB. I don't really understand why there is such a performance gap in both the running time and memory consumption. I have read relative questions like this one and that one but I am still confused.
EDIT: I reduced the running time to 40 ms with 11.5 MB of memory after replacing vector<bool>
with vector<char>
.
c++ arrays performance vector std
c++ arrays performance vector std
edited Apr 18 at 22:29
isanae
2,6551 gold badge16 silver badges38 bronze badges
2,6551 gold badge16 silver badges38 bronze badges
asked Apr 18 at 11:43
Qin HeyangQin Heyang
4373 silver badges8 bronze badges
4373 silver badges8 bronze badges
6
How are you compiling your code? What compiler options? Also;vector<bool>
is special.
– Jesper Juhl
Apr 18 at 11:45
13
be carfull, std::vector<bool> is a weird specialisation, more a bitset than a vector
– Martin Morterol
Apr 18 at 11:49
3
you may gain some time changing one loop a bit:for(int i = 2; i * i <n;i++)
since ifi * i >= n
then next loop does nothing.
– Marek R
Apr 18 at 12:20
3
This doesn't address the question, but when you're dealing with boolean types, usetrue
andfalse
and not1
. So:vector<bool> flag(n+1, true);
andif (flag[i])
. That doesn't affect the result, but it makes it much clearer what you're doing.
– Pete Becker
Apr 18 at 12:53
4
If you are not already doing so, make sure to compile your code with compiler optimizations enabled before benchmarking.
– Jesper Juhl
Apr 18 at 14:22
|
show 2 more comments
6
How are you compiling your code? What compiler options? Also;vector<bool>
is special.
– Jesper Juhl
Apr 18 at 11:45
13
be carfull, std::vector<bool> is a weird specialisation, more a bitset than a vector
– Martin Morterol
Apr 18 at 11:49
3
you may gain some time changing one loop a bit:for(int i = 2; i * i <n;i++)
since ifi * i >= n
then next loop does nothing.
– Marek R
Apr 18 at 12:20
3
This doesn't address the question, but when you're dealing with boolean types, usetrue
andfalse
and not1
. So:vector<bool> flag(n+1, true);
andif (flag[i])
. That doesn't affect the result, but it makes it much clearer what you're doing.
– Pete Becker
Apr 18 at 12:53
4
If you are not already doing so, make sure to compile your code with compiler optimizations enabled before benchmarking.
– Jesper Juhl
Apr 18 at 14:22
6
6
How are you compiling your code? What compiler options? Also;
vector<bool>
is special.– Jesper Juhl
Apr 18 at 11:45
How are you compiling your code? What compiler options? Also;
vector<bool>
is special.– Jesper Juhl
Apr 18 at 11:45
13
13
be carfull, std::vector<bool> is a weird specialisation, more a bitset than a vector
– Martin Morterol
Apr 18 at 11:49
be carfull, std::vector<bool> is a weird specialisation, more a bitset than a vector
– Martin Morterol
Apr 18 at 11:49
3
3
you may gain some time changing one loop a bit:
for(int i = 2; i * i <n;i++)
since if i * i >= n
then next loop does nothing.– Marek R
Apr 18 at 12:20
you may gain some time changing one loop a bit:
for(int i = 2; i * i <n;i++)
since if i * i >= n
then next loop does nothing.– Marek R
Apr 18 at 12:20
3
3
This doesn't address the question, but when you're dealing with boolean types, use
true
and false
and not 1
. So: vector<bool> flag(n+1, true);
and if (flag[i])
. That doesn't affect the result, but it makes it much clearer what you're doing.– Pete Becker
Apr 18 at 12:53
This doesn't address the question, but when you're dealing with boolean types, use
true
and false
and not 1
. So: vector<bool> flag(n+1, true);
and if (flag[i])
. That doesn't affect the result, but it makes it much clearer what you're doing.– Pete Becker
Apr 18 at 12:53
4
4
If you are not already doing so, make sure to compile your code with compiler optimizations enabled before benchmarking.
– Jesper Juhl
Apr 18 at 14:22
If you are not already doing so, make sure to compile your code with compiler optimizations enabled before benchmarking.
– Jesper Juhl
Apr 18 at 14:22
|
show 2 more comments
4 Answers
4
active
oldest
votes
std::vector<bool>
isn't like any other vector. The documentation says:
std::vector<bool>
is a possibly space-efficient specialization of
std::vector
for the typebool
.
That's why it may use up less memory than an array, because it might represent multiple boolean values with one byte, like a bitset. It also explains the performance difference, since accessing it isn't as simple anymore. According to the documentation, it doesn't even have to store it as a contiguous array.
add a comment
|
std::vector<bool>
is special case. It is specialized template. Each value is stored in single bit, so bit operations are needed. This memory compact but has couple drawbacks (like no way to have a pointer to bool
inside this container).
Now bool flag[n+1];
compiler will usually allocate same memory in same manner as for char flag[n+1];
and it will do that on stack, not on heap.
Now depending on page sizes, cache misses and i
values one can be faster then other. It is hard to predict (for small n
array will be faster, but for larger n
result may change).
As an interesting experiment you can change std::vector<bool>
to std::vector<char>
. In this case you will have similar memory mapping as in case of array, but it will be located at heap not a stack.
add a comment
|
I'd like to add some remarks to the good answers already posted.
The performance differences between
std::vector<bool>
andstd::vector<char>
may vary (a lot) between different library implementations and different sizes of the vectors.See e.g. those quick benches: clang++ / libc++(LLVM) vs. g++ / libstdc++(GNU).
This:
bool flag[n+1];
declares a Variable Length Array, which (despites some performance advantages due to it beeing allocated in the stack) has never been part of the C++ standard, even if provided as an extension by some (C99 compliant) compilers.Another way to increase the performances could be to reduce the amount of calculations (and memory occupation) by considering only the odd numbers, given that all the primes except for 2 are odd.
If you can bare the less readable code, you could try to profile the following snippet.
int countPrimes(int n)
if ( n < 2 )
return 0;
// Sieve starting from 3 up to n, the number of odd number between 3 and n are
int sieve_size = n / 2 - 1;
std::vector<char> sieve(sieve_size);
int result = 1; // 2 is a prime.
for (int i = 0; i < sieve_size; ++i)
if ( sieve[i] == 0 )
// It's a prime, no need to scan the vector again
++result;
// Some ugly transformations are needed, here
int prime = i * 2 + 3;
for ( int j = prime * 3, k = prime * 2; j <= n; j += k)
sieve[j / 2 - 1] = 1;
return result;
Edit
As Peter Cordes noted in the comments, using an unsigned type for the variable j
the compiler can implement j/2 as cheaply as possible. C signed division by a power of 2 has different rounding semantics (for negative dividends) than a right shift, and compilers don't always propagate value-range proofs sufficiently to prove that j will always be non-negative.
It's also possible to reduce the number of candidates exploiting the fact that all primes (past 2 and 3) are one below or above a multiple of 6.
1
Wow, 20 years after C99, and not even the most rudimentary VLAs... Looks like C++ is getting more and more behind C ;-) Afaik, there was once a proposal that would have madeint array[length];
legal in C++, but even that proposal did not allow for multidimensional arrays, neither on the stack (int array2d[height][width]
) nor on the heap (int (*array2d)[width] = new int[height][width];
). Apparently, that proposal was never approved, even though C goes as far as to allow the multidimensional case (int (*array2d)[width] = malloc(height * sizeof(*array2d));
) since 20 years ago...
– cmaster
Apr 18 at 15:53
@cmaster Well, that's debatable ;)
– Bob__
Apr 18 at 16:02
1
Yeah, I know. Especially since very few people actually understand the power of the C99 syntax - both of the top-scoring answers of the question you linked completely underestimate its value. Because, you see, C99 VLAs not just exist on the stack, they can also be allocated on the heap. I have already given an example of that, it's the last one. The real power comes from the fact that any C99 array type can have a runtime length, it can be pointed to or typedefed. And C99 does not care how many runtime length array types you nest, the indexing calculation comes out right.
– cmaster
Apr 18 at 20:09
1
Long story short: I really enjoy working with C++ for the most part, but when it comes to manipulating multidimensional arrays of plain data, (images, volumes, simulations), I'll just fall back to C99 for its VLA syntax.
– cmaster
Apr 18 at 20:11
1
gcc does show a difference there, because it's actually doing worse with signedint
. Overall it makes faster code, but you can see that it usingsar
/sub
/movslq
to sign-extend back to 64-bit. I suggested usingsize_t
re-extension to pointer width wouldn't be necessary. (Although zero-extension is free on x86-64; writing a 32-bit reg zero-extends into the 64-bit reg, but 64 lets it use-1
in the addr mode.) Anyway,unsigned long
does give an extra speedup with gcc better thanunsigned
. quick-bench.com/4t9DE9nsT-oQYSNo0leeyYZ8GvY. (long int
also is the same.)
– Peter Cordes
Apr 19 at 9:38
|
show 3 more comments
I am getting different timings and memory usage than the ones mentioned in the question when compiling with g++-7.4.0 -g -march=native -O2 -Wall
and running on a Ryzen 5 1600 CPU:
vector<bool>
: 0.038 seconds, 3344 KiB memory, IPC 3.16vector<char>
: 0.048 seconds, 12004 KiB memory, IPC 1.52bool[N]
: 0.050 seconds, 12644 KiB memory, IPC 1.69
Conclusion: vector<bool>
is the fastest option because of its higher IPC (instructions per clock).
#include <stdio.h>
#include <stdlib.h>
#include <sys/resource.h>
#include <vector>
size_t countPrimes(size_t n)
std::vector<bool> flag(n+1,1);
//std::vector<char> flag(n+1,1);
//bool flag[n+1]; std::fill(flag,flag+n+1,true);
for(size_t i=2;i<n;i++)
if(flag[i]==1)
for(size_t j=i;i*j<n;j++)
flag[i*j]=0;
size_t result=0;
for(size_t i=2;i<n;i++)
result+=flag[i];
return result;
int main()
const rlim_t kStackSize = 16*1024*1024;
struct rlimit rl;
int result = getrlimit(RLIMIT_STACK, &rl);
if(result != 0) abort();
if(rl.rlim_cur < kStackSize)
rl.rlim_cur = kStackSize;
result = setrlimit(RLIMIT_STACK, &rl);
if(result != 0) abort();
printf("%zun", countPrimes(10e6));
return 0;
add a comment
|
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55745312%2fperformance-gap-between-vectorbool-and-array%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
std::vector<bool>
isn't like any other vector. The documentation says:
std::vector<bool>
is a possibly space-efficient specialization of
std::vector
for the typebool
.
That's why it may use up less memory than an array, because it might represent multiple boolean values with one byte, like a bitset. It also explains the performance difference, since accessing it isn't as simple anymore. According to the documentation, it doesn't even have to store it as a contiguous array.
add a comment
|
std::vector<bool>
isn't like any other vector. The documentation says:
std::vector<bool>
is a possibly space-efficient specialization of
std::vector
for the typebool
.
That's why it may use up less memory than an array, because it might represent multiple boolean values with one byte, like a bitset. It also explains the performance difference, since accessing it isn't as simple anymore. According to the documentation, it doesn't even have to store it as a contiguous array.
add a comment
|
std::vector<bool>
isn't like any other vector. The documentation says:
std::vector<bool>
is a possibly space-efficient specialization of
std::vector
for the typebool
.
That's why it may use up less memory than an array, because it might represent multiple boolean values with one byte, like a bitset. It also explains the performance difference, since accessing it isn't as simple anymore. According to the documentation, it doesn't even have to store it as a contiguous array.
std::vector<bool>
isn't like any other vector. The documentation says:
std::vector<bool>
is a possibly space-efficient specialization of
std::vector
for the typebool
.
That's why it may use up less memory than an array, because it might represent multiple boolean values with one byte, like a bitset. It also explains the performance difference, since accessing it isn't as simple anymore. According to the documentation, it doesn't even have to store it as a contiguous array.
answered Apr 18 at 11:52
BlazeBlaze
12k1 gold badge17 silver badges37 bronze badges
12k1 gold badge17 silver badges37 bronze badges
add a comment
|
add a comment
|
std::vector<bool>
is special case. It is specialized template. Each value is stored in single bit, so bit operations are needed. This memory compact but has couple drawbacks (like no way to have a pointer to bool
inside this container).
Now bool flag[n+1];
compiler will usually allocate same memory in same manner as for char flag[n+1];
and it will do that on stack, not on heap.
Now depending on page sizes, cache misses and i
values one can be faster then other. It is hard to predict (for small n
array will be faster, but for larger n
result may change).
As an interesting experiment you can change std::vector<bool>
to std::vector<char>
. In this case you will have similar memory mapping as in case of array, but it will be located at heap not a stack.
add a comment
|
std::vector<bool>
is special case. It is specialized template. Each value is stored in single bit, so bit operations are needed. This memory compact but has couple drawbacks (like no way to have a pointer to bool
inside this container).
Now bool flag[n+1];
compiler will usually allocate same memory in same manner as for char flag[n+1];
and it will do that on stack, not on heap.
Now depending on page sizes, cache misses and i
values one can be faster then other. It is hard to predict (for small n
array will be faster, but for larger n
result may change).
As an interesting experiment you can change std::vector<bool>
to std::vector<char>
. In this case you will have similar memory mapping as in case of array, but it will be located at heap not a stack.
add a comment
|
std::vector<bool>
is special case. It is specialized template. Each value is stored in single bit, so bit operations are needed. This memory compact but has couple drawbacks (like no way to have a pointer to bool
inside this container).
Now bool flag[n+1];
compiler will usually allocate same memory in same manner as for char flag[n+1];
and it will do that on stack, not on heap.
Now depending on page sizes, cache misses and i
values one can be faster then other. It is hard to predict (for small n
array will be faster, but for larger n
result may change).
As an interesting experiment you can change std::vector<bool>
to std::vector<char>
. In this case you will have similar memory mapping as in case of array, but it will be located at heap not a stack.
std::vector<bool>
is special case. It is specialized template. Each value is stored in single bit, so bit operations are needed. This memory compact but has couple drawbacks (like no way to have a pointer to bool
inside this container).
Now bool flag[n+1];
compiler will usually allocate same memory in same manner as for char flag[n+1];
and it will do that on stack, not on heap.
Now depending on page sizes, cache misses and i
values one can be faster then other. It is hard to predict (for small n
array will be faster, but for larger n
result may change).
As an interesting experiment you can change std::vector<bool>
to std::vector<char>
. In this case you will have similar memory mapping as in case of array, but it will be located at heap not a stack.
edited Apr 18 at 12:15
Jesper Juhl
1
1
answered Apr 18 at 11:53
Marek RMarek R
15.7k3 gold badges30 silver badges81 bronze badges
15.7k3 gold badges30 silver badges81 bronze badges
add a comment
|
add a comment
|
I'd like to add some remarks to the good answers already posted.
The performance differences between
std::vector<bool>
andstd::vector<char>
may vary (a lot) between different library implementations and different sizes of the vectors.See e.g. those quick benches: clang++ / libc++(LLVM) vs. g++ / libstdc++(GNU).
This:
bool flag[n+1];
declares a Variable Length Array, which (despites some performance advantages due to it beeing allocated in the stack) has never been part of the C++ standard, even if provided as an extension by some (C99 compliant) compilers.Another way to increase the performances could be to reduce the amount of calculations (and memory occupation) by considering only the odd numbers, given that all the primes except for 2 are odd.
If you can bare the less readable code, you could try to profile the following snippet.
int countPrimes(int n)
if ( n < 2 )
return 0;
// Sieve starting from 3 up to n, the number of odd number between 3 and n are
int sieve_size = n / 2 - 1;
std::vector<char> sieve(sieve_size);
int result = 1; // 2 is a prime.
for (int i = 0; i < sieve_size; ++i)
if ( sieve[i] == 0 )
// It's a prime, no need to scan the vector again
++result;
// Some ugly transformations are needed, here
int prime = i * 2 + 3;
for ( int j = prime * 3, k = prime * 2; j <= n; j += k)
sieve[j / 2 - 1] = 1;
return result;
Edit
As Peter Cordes noted in the comments, using an unsigned type for the variable j
the compiler can implement j/2 as cheaply as possible. C signed division by a power of 2 has different rounding semantics (for negative dividends) than a right shift, and compilers don't always propagate value-range proofs sufficiently to prove that j will always be non-negative.
It's also possible to reduce the number of candidates exploiting the fact that all primes (past 2 and 3) are one below or above a multiple of 6.
1
Wow, 20 years after C99, and not even the most rudimentary VLAs... Looks like C++ is getting more and more behind C ;-) Afaik, there was once a proposal that would have madeint array[length];
legal in C++, but even that proposal did not allow for multidimensional arrays, neither on the stack (int array2d[height][width]
) nor on the heap (int (*array2d)[width] = new int[height][width];
). Apparently, that proposal was never approved, even though C goes as far as to allow the multidimensional case (int (*array2d)[width] = malloc(height * sizeof(*array2d));
) since 20 years ago...
– cmaster
Apr 18 at 15:53
@cmaster Well, that's debatable ;)
– Bob__
Apr 18 at 16:02
1
Yeah, I know. Especially since very few people actually understand the power of the C99 syntax - both of the top-scoring answers of the question you linked completely underestimate its value. Because, you see, C99 VLAs not just exist on the stack, they can also be allocated on the heap. I have already given an example of that, it's the last one. The real power comes from the fact that any C99 array type can have a runtime length, it can be pointed to or typedefed. And C99 does not care how many runtime length array types you nest, the indexing calculation comes out right.
– cmaster
Apr 18 at 20:09
1
Long story short: I really enjoy working with C++ for the most part, but when it comes to manipulating multidimensional arrays of plain data, (images, volumes, simulations), I'll just fall back to C99 for its VLA syntax.
– cmaster
Apr 18 at 20:11
1
gcc does show a difference there, because it's actually doing worse with signedint
. Overall it makes faster code, but you can see that it usingsar
/sub
/movslq
to sign-extend back to 64-bit. I suggested usingsize_t
re-extension to pointer width wouldn't be necessary. (Although zero-extension is free on x86-64; writing a 32-bit reg zero-extends into the 64-bit reg, but 64 lets it use-1
in the addr mode.) Anyway,unsigned long
does give an extra speedup with gcc better thanunsigned
. quick-bench.com/4t9DE9nsT-oQYSNo0leeyYZ8GvY. (long int
also is the same.)
– Peter Cordes
Apr 19 at 9:38
|
show 3 more comments
I'd like to add some remarks to the good answers already posted.
The performance differences between
std::vector<bool>
andstd::vector<char>
may vary (a lot) between different library implementations and different sizes of the vectors.See e.g. those quick benches: clang++ / libc++(LLVM) vs. g++ / libstdc++(GNU).
This:
bool flag[n+1];
declares a Variable Length Array, which (despites some performance advantages due to it beeing allocated in the stack) has never been part of the C++ standard, even if provided as an extension by some (C99 compliant) compilers.Another way to increase the performances could be to reduce the amount of calculations (and memory occupation) by considering only the odd numbers, given that all the primes except for 2 are odd.
If you can bare the less readable code, you could try to profile the following snippet.
int countPrimes(int n)
if ( n < 2 )
return 0;
// Sieve starting from 3 up to n, the number of odd number between 3 and n are
int sieve_size = n / 2 - 1;
std::vector<char> sieve(sieve_size);
int result = 1; // 2 is a prime.
for (int i = 0; i < sieve_size; ++i)
if ( sieve[i] == 0 )
// It's a prime, no need to scan the vector again
++result;
// Some ugly transformations are needed, here
int prime = i * 2 + 3;
for ( int j = prime * 3, k = prime * 2; j <= n; j += k)
sieve[j / 2 - 1] = 1;
return result;
Edit
As Peter Cordes noted in the comments, using an unsigned type for the variable j
the compiler can implement j/2 as cheaply as possible. C signed division by a power of 2 has different rounding semantics (for negative dividends) than a right shift, and compilers don't always propagate value-range proofs sufficiently to prove that j will always be non-negative.
It's also possible to reduce the number of candidates exploiting the fact that all primes (past 2 and 3) are one below or above a multiple of 6.
1
Wow, 20 years after C99, and not even the most rudimentary VLAs... Looks like C++ is getting more and more behind C ;-) Afaik, there was once a proposal that would have madeint array[length];
legal in C++, but even that proposal did not allow for multidimensional arrays, neither on the stack (int array2d[height][width]
) nor on the heap (int (*array2d)[width] = new int[height][width];
). Apparently, that proposal was never approved, even though C goes as far as to allow the multidimensional case (int (*array2d)[width] = malloc(height * sizeof(*array2d));
) since 20 years ago...
– cmaster
Apr 18 at 15:53
@cmaster Well, that's debatable ;)
– Bob__
Apr 18 at 16:02
1
Yeah, I know. Especially since very few people actually understand the power of the C99 syntax - both of the top-scoring answers of the question you linked completely underestimate its value. Because, you see, C99 VLAs not just exist on the stack, they can also be allocated on the heap. I have already given an example of that, it's the last one. The real power comes from the fact that any C99 array type can have a runtime length, it can be pointed to or typedefed. And C99 does not care how many runtime length array types you nest, the indexing calculation comes out right.
– cmaster
Apr 18 at 20:09
1
Long story short: I really enjoy working with C++ for the most part, but when it comes to manipulating multidimensional arrays of plain data, (images, volumes, simulations), I'll just fall back to C99 for its VLA syntax.
– cmaster
Apr 18 at 20:11
1
gcc does show a difference there, because it's actually doing worse with signedint
. Overall it makes faster code, but you can see that it usingsar
/sub
/movslq
to sign-extend back to 64-bit. I suggested usingsize_t
re-extension to pointer width wouldn't be necessary. (Although zero-extension is free on x86-64; writing a 32-bit reg zero-extends into the 64-bit reg, but 64 lets it use-1
in the addr mode.) Anyway,unsigned long
does give an extra speedup with gcc better thanunsigned
. quick-bench.com/4t9DE9nsT-oQYSNo0leeyYZ8GvY. (long int
also is the same.)
– Peter Cordes
Apr 19 at 9:38
|
show 3 more comments
I'd like to add some remarks to the good answers already posted.
The performance differences between
std::vector<bool>
andstd::vector<char>
may vary (a lot) between different library implementations and different sizes of the vectors.See e.g. those quick benches: clang++ / libc++(LLVM) vs. g++ / libstdc++(GNU).
This:
bool flag[n+1];
declares a Variable Length Array, which (despites some performance advantages due to it beeing allocated in the stack) has never been part of the C++ standard, even if provided as an extension by some (C99 compliant) compilers.Another way to increase the performances could be to reduce the amount of calculations (and memory occupation) by considering only the odd numbers, given that all the primes except for 2 are odd.
If you can bare the less readable code, you could try to profile the following snippet.
int countPrimes(int n)
if ( n < 2 )
return 0;
// Sieve starting from 3 up to n, the number of odd number between 3 and n are
int sieve_size = n / 2 - 1;
std::vector<char> sieve(sieve_size);
int result = 1; // 2 is a prime.
for (int i = 0; i < sieve_size; ++i)
if ( sieve[i] == 0 )
// It's a prime, no need to scan the vector again
++result;
// Some ugly transformations are needed, here
int prime = i * 2 + 3;
for ( int j = prime * 3, k = prime * 2; j <= n; j += k)
sieve[j / 2 - 1] = 1;
return result;
Edit
As Peter Cordes noted in the comments, using an unsigned type for the variable j
the compiler can implement j/2 as cheaply as possible. C signed division by a power of 2 has different rounding semantics (for negative dividends) than a right shift, and compilers don't always propagate value-range proofs sufficiently to prove that j will always be non-negative.
It's also possible to reduce the number of candidates exploiting the fact that all primes (past 2 and 3) are one below or above a multiple of 6.
I'd like to add some remarks to the good answers already posted.
The performance differences between
std::vector<bool>
andstd::vector<char>
may vary (a lot) between different library implementations and different sizes of the vectors.See e.g. those quick benches: clang++ / libc++(LLVM) vs. g++ / libstdc++(GNU).
This:
bool flag[n+1];
declares a Variable Length Array, which (despites some performance advantages due to it beeing allocated in the stack) has never been part of the C++ standard, even if provided as an extension by some (C99 compliant) compilers.Another way to increase the performances could be to reduce the amount of calculations (and memory occupation) by considering only the odd numbers, given that all the primes except for 2 are odd.
If you can bare the less readable code, you could try to profile the following snippet.
int countPrimes(int n)
if ( n < 2 )
return 0;
// Sieve starting from 3 up to n, the number of odd number between 3 and n are
int sieve_size = n / 2 - 1;
std::vector<char> sieve(sieve_size);
int result = 1; // 2 is a prime.
for (int i = 0; i < sieve_size; ++i)
if ( sieve[i] == 0 )
// It's a prime, no need to scan the vector again
++result;
// Some ugly transformations are needed, here
int prime = i * 2 + 3;
for ( int j = prime * 3, k = prime * 2; j <= n; j += k)
sieve[j / 2 - 1] = 1;
return result;
Edit
As Peter Cordes noted in the comments, using an unsigned type for the variable j
the compiler can implement j/2 as cheaply as possible. C signed division by a power of 2 has different rounding semantics (for negative dividends) than a right shift, and compilers don't always propagate value-range proofs sufficiently to prove that j will always be non-negative.
It's also possible to reduce the number of candidates exploiting the fact that all primes (past 2 and 3) are one below or above a multiple of 6.
edited Apr 19 at 9:37
answered Apr 18 at 14:59
Bob__Bob__
6,1233 gold badges17 silver badges28 bronze badges
6,1233 gold badges17 silver badges28 bronze badges
1
Wow, 20 years after C99, and not even the most rudimentary VLAs... Looks like C++ is getting more and more behind C ;-) Afaik, there was once a proposal that would have madeint array[length];
legal in C++, but even that proposal did not allow for multidimensional arrays, neither on the stack (int array2d[height][width]
) nor on the heap (int (*array2d)[width] = new int[height][width];
). Apparently, that proposal was never approved, even though C goes as far as to allow the multidimensional case (int (*array2d)[width] = malloc(height * sizeof(*array2d));
) since 20 years ago...
– cmaster
Apr 18 at 15:53
@cmaster Well, that's debatable ;)
– Bob__
Apr 18 at 16:02
1
Yeah, I know. Especially since very few people actually understand the power of the C99 syntax - both of the top-scoring answers of the question you linked completely underestimate its value. Because, you see, C99 VLAs not just exist on the stack, they can also be allocated on the heap. I have already given an example of that, it's the last one. The real power comes from the fact that any C99 array type can have a runtime length, it can be pointed to or typedefed. And C99 does not care how many runtime length array types you nest, the indexing calculation comes out right.
– cmaster
Apr 18 at 20:09
1
Long story short: I really enjoy working with C++ for the most part, but when it comes to manipulating multidimensional arrays of plain data, (images, volumes, simulations), I'll just fall back to C99 for its VLA syntax.
– cmaster
Apr 18 at 20:11
1
gcc does show a difference there, because it's actually doing worse with signedint
. Overall it makes faster code, but you can see that it usingsar
/sub
/movslq
to sign-extend back to 64-bit. I suggested usingsize_t
re-extension to pointer width wouldn't be necessary. (Although zero-extension is free on x86-64; writing a 32-bit reg zero-extends into the 64-bit reg, but 64 lets it use-1
in the addr mode.) Anyway,unsigned long
does give an extra speedup with gcc better thanunsigned
. quick-bench.com/4t9DE9nsT-oQYSNo0leeyYZ8GvY. (long int
also is the same.)
– Peter Cordes
Apr 19 at 9:38
|
show 3 more comments
1
Wow, 20 years after C99, and not even the most rudimentary VLAs... Looks like C++ is getting more and more behind C ;-) Afaik, there was once a proposal that would have madeint array[length];
legal in C++, but even that proposal did not allow for multidimensional arrays, neither on the stack (int array2d[height][width]
) nor on the heap (int (*array2d)[width] = new int[height][width];
). Apparently, that proposal was never approved, even though C goes as far as to allow the multidimensional case (int (*array2d)[width] = malloc(height * sizeof(*array2d));
) since 20 years ago...
– cmaster
Apr 18 at 15:53
@cmaster Well, that's debatable ;)
– Bob__
Apr 18 at 16:02
1
Yeah, I know. Especially since very few people actually understand the power of the C99 syntax - both of the top-scoring answers of the question you linked completely underestimate its value. Because, you see, C99 VLAs not just exist on the stack, they can also be allocated on the heap. I have already given an example of that, it's the last one. The real power comes from the fact that any C99 array type can have a runtime length, it can be pointed to or typedefed. And C99 does not care how many runtime length array types you nest, the indexing calculation comes out right.
– cmaster
Apr 18 at 20:09
1
Long story short: I really enjoy working with C++ for the most part, but when it comes to manipulating multidimensional arrays of plain data, (images, volumes, simulations), I'll just fall back to C99 for its VLA syntax.
– cmaster
Apr 18 at 20:11
1
gcc does show a difference there, because it's actually doing worse with signedint
. Overall it makes faster code, but you can see that it usingsar
/sub
/movslq
to sign-extend back to 64-bit. I suggested usingsize_t
re-extension to pointer width wouldn't be necessary. (Although zero-extension is free on x86-64; writing a 32-bit reg zero-extends into the 64-bit reg, but 64 lets it use-1
in the addr mode.) Anyway,unsigned long
does give an extra speedup with gcc better thanunsigned
. quick-bench.com/4t9DE9nsT-oQYSNo0leeyYZ8GvY. (long int
also is the same.)
– Peter Cordes
Apr 19 at 9:38
1
1
Wow, 20 years after C99, and not even the most rudimentary VLAs... Looks like C++ is getting more and more behind C ;-) Afaik, there was once a proposal that would have made
int array[length];
legal in C++, but even that proposal did not allow for multidimensional arrays, neither on the stack (int array2d[height][width]
) nor on the heap (int (*array2d)[width] = new int[height][width];
). Apparently, that proposal was never approved, even though C goes as far as to allow the multidimensional case (int (*array2d)[width] = malloc(height * sizeof(*array2d));
) since 20 years ago...– cmaster
Apr 18 at 15:53
Wow, 20 years after C99, and not even the most rudimentary VLAs... Looks like C++ is getting more and more behind C ;-) Afaik, there was once a proposal that would have made
int array[length];
legal in C++, but even that proposal did not allow for multidimensional arrays, neither on the stack (int array2d[height][width]
) nor on the heap (int (*array2d)[width] = new int[height][width];
). Apparently, that proposal was never approved, even though C goes as far as to allow the multidimensional case (int (*array2d)[width] = malloc(height * sizeof(*array2d));
) since 20 years ago...– cmaster
Apr 18 at 15:53
@cmaster Well, that's debatable ;)
– Bob__
Apr 18 at 16:02
@cmaster Well, that's debatable ;)
– Bob__
Apr 18 at 16:02
1
1
Yeah, I know. Especially since very few people actually understand the power of the C99 syntax - both of the top-scoring answers of the question you linked completely underestimate its value. Because, you see, C99 VLAs not just exist on the stack, they can also be allocated on the heap. I have already given an example of that, it's the last one. The real power comes from the fact that any C99 array type can have a runtime length, it can be pointed to or typedefed. And C99 does not care how many runtime length array types you nest, the indexing calculation comes out right.
– cmaster
Apr 18 at 20:09
Yeah, I know. Especially since very few people actually understand the power of the C99 syntax - both of the top-scoring answers of the question you linked completely underestimate its value. Because, you see, C99 VLAs not just exist on the stack, they can also be allocated on the heap. I have already given an example of that, it's the last one. The real power comes from the fact that any C99 array type can have a runtime length, it can be pointed to or typedefed. And C99 does not care how many runtime length array types you nest, the indexing calculation comes out right.
– cmaster
Apr 18 at 20:09
1
1
Long story short: I really enjoy working with C++ for the most part, but when it comes to manipulating multidimensional arrays of plain data, (images, volumes, simulations), I'll just fall back to C99 for its VLA syntax.
– cmaster
Apr 18 at 20:11
Long story short: I really enjoy working with C++ for the most part, but when it comes to manipulating multidimensional arrays of plain data, (images, volumes, simulations), I'll just fall back to C99 for its VLA syntax.
– cmaster
Apr 18 at 20:11
1
1
gcc does show a difference there, because it's actually doing worse with signed
int
. Overall it makes faster code, but you can see that it using sar
/ sub
/ movslq
to sign-extend back to 64-bit. I suggested using size_t
re-extension to pointer width wouldn't be necessary. (Although zero-extension is free on x86-64; writing a 32-bit reg zero-extends into the 64-bit reg, but 64 lets it use -1
in the addr mode.) Anyway, unsigned long
does give an extra speedup with gcc better than unsigned
. quick-bench.com/4t9DE9nsT-oQYSNo0leeyYZ8GvY. (long int
also is the same.)– Peter Cordes
Apr 19 at 9:38
gcc does show a difference there, because it's actually doing worse with signed
int
. Overall it makes faster code, but you can see that it using sar
/ sub
/ movslq
to sign-extend back to 64-bit. I suggested using size_t
re-extension to pointer width wouldn't be necessary. (Although zero-extension is free on x86-64; writing a 32-bit reg zero-extends into the 64-bit reg, but 64 lets it use -1
in the addr mode.) Anyway, unsigned long
does give an extra speedup with gcc better than unsigned
. quick-bench.com/4t9DE9nsT-oQYSNo0leeyYZ8GvY. (long int
also is the same.)– Peter Cordes
Apr 19 at 9:38
|
show 3 more comments
I am getting different timings and memory usage than the ones mentioned in the question when compiling with g++-7.4.0 -g -march=native -O2 -Wall
and running on a Ryzen 5 1600 CPU:
vector<bool>
: 0.038 seconds, 3344 KiB memory, IPC 3.16vector<char>
: 0.048 seconds, 12004 KiB memory, IPC 1.52bool[N]
: 0.050 seconds, 12644 KiB memory, IPC 1.69
Conclusion: vector<bool>
is the fastest option because of its higher IPC (instructions per clock).
#include <stdio.h>
#include <stdlib.h>
#include <sys/resource.h>
#include <vector>
size_t countPrimes(size_t n)
std::vector<bool> flag(n+1,1);
//std::vector<char> flag(n+1,1);
//bool flag[n+1]; std::fill(flag,flag+n+1,true);
for(size_t i=2;i<n;i++)
if(flag[i]==1)
for(size_t j=i;i*j<n;j++)
flag[i*j]=0;
size_t result=0;
for(size_t i=2;i<n;i++)
result+=flag[i];
return result;
int main()
const rlim_t kStackSize = 16*1024*1024;
struct rlimit rl;
int result = getrlimit(RLIMIT_STACK, &rl);
if(result != 0) abort();
if(rl.rlim_cur < kStackSize)
rl.rlim_cur = kStackSize;
result = setrlimit(RLIMIT_STACK, &rl);
if(result != 0) abort();
printf("%zun", countPrimes(10e6));
return 0;
add a comment
|
I am getting different timings and memory usage than the ones mentioned in the question when compiling with g++-7.4.0 -g -march=native -O2 -Wall
and running on a Ryzen 5 1600 CPU:
vector<bool>
: 0.038 seconds, 3344 KiB memory, IPC 3.16vector<char>
: 0.048 seconds, 12004 KiB memory, IPC 1.52bool[N]
: 0.050 seconds, 12644 KiB memory, IPC 1.69
Conclusion: vector<bool>
is the fastest option because of its higher IPC (instructions per clock).
#include <stdio.h>
#include <stdlib.h>
#include <sys/resource.h>
#include <vector>
size_t countPrimes(size_t n)
std::vector<bool> flag(n+1,1);
//std::vector<char> flag(n+1,1);
//bool flag[n+1]; std::fill(flag,flag+n+1,true);
for(size_t i=2;i<n;i++)
if(flag[i]==1)
for(size_t j=i;i*j<n;j++)
flag[i*j]=0;
size_t result=0;
for(size_t i=2;i<n;i++)
result+=flag[i];
return result;
int main()
const rlim_t kStackSize = 16*1024*1024;
struct rlimit rl;
int result = getrlimit(RLIMIT_STACK, &rl);
if(result != 0) abort();
if(rl.rlim_cur < kStackSize)
rl.rlim_cur = kStackSize;
result = setrlimit(RLIMIT_STACK, &rl);
if(result != 0) abort();
printf("%zun", countPrimes(10e6));
return 0;
add a comment
|
I am getting different timings and memory usage than the ones mentioned in the question when compiling with g++-7.4.0 -g -march=native -O2 -Wall
and running on a Ryzen 5 1600 CPU:
vector<bool>
: 0.038 seconds, 3344 KiB memory, IPC 3.16vector<char>
: 0.048 seconds, 12004 KiB memory, IPC 1.52bool[N]
: 0.050 seconds, 12644 KiB memory, IPC 1.69
Conclusion: vector<bool>
is the fastest option because of its higher IPC (instructions per clock).
#include <stdio.h>
#include <stdlib.h>
#include <sys/resource.h>
#include <vector>
size_t countPrimes(size_t n)
std::vector<bool> flag(n+1,1);
//std::vector<char> flag(n+1,1);
//bool flag[n+1]; std::fill(flag,flag+n+1,true);
for(size_t i=2;i<n;i++)
if(flag[i]==1)
for(size_t j=i;i*j<n;j++)
flag[i*j]=0;
size_t result=0;
for(size_t i=2;i<n;i++)
result+=flag[i];
return result;
int main()
const rlim_t kStackSize = 16*1024*1024;
struct rlimit rl;
int result = getrlimit(RLIMIT_STACK, &rl);
if(result != 0) abort();
if(rl.rlim_cur < kStackSize)
rl.rlim_cur = kStackSize;
result = setrlimit(RLIMIT_STACK, &rl);
if(result != 0) abort();
printf("%zun", countPrimes(10e6));
return 0;
I am getting different timings and memory usage than the ones mentioned in the question when compiling with g++-7.4.0 -g -march=native -O2 -Wall
and running on a Ryzen 5 1600 CPU:
vector<bool>
: 0.038 seconds, 3344 KiB memory, IPC 3.16vector<char>
: 0.048 seconds, 12004 KiB memory, IPC 1.52bool[N]
: 0.050 seconds, 12644 KiB memory, IPC 1.69
Conclusion: vector<bool>
is the fastest option because of its higher IPC (instructions per clock).
#include <stdio.h>
#include <stdlib.h>
#include <sys/resource.h>
#include <vector>
size_t countPrimes(size_t n)
std::vector<bool> flag(n+1,1);
//std::vector<char> flag(n+1,1);
//bool flag[n+1]; std::fill(flag,flag+n+1,true);
for(size_t i=2;i<n;i++)
if(flag[i]==1)
for(size_t j=i;i*j<n;j++)
flag[i*j]=0;
size_t result=0;
for(size_t i=2;i<n;i++)
result+=flag[i];
return result;
int main()
const rlim_t kStackSize = 16*1024*1024;
struct rlimit rl;
int result = getrlimit(RLIMIT_STACK, &rl);
if(result != 0) abort();
if(rl.rlim_cur < kStackSize)
rl.rlim_cur = kStackSize;
result = setrlimit(RLIMIT_STACK, &rl);
if(result != 0) abort();
printf("%zun", countPrimes(10e6));
return 0;
answered Apr 19 at 13:02
atomsymbolatomsymbol
2165 silver badges10 bronze badges
2165 silver badges10 bronze badges
add a comment
|
add a comment
|
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55745312%2fperformance-gap-between-vectorbool-and-array%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
6
How are you compiling your code? What compiler options? Also;
vector<bool>
is special.– Jesper Juhl
Apr 18 at 11:45
13
be carfull, std::vector<bool> is a weird specialisation, more a bitset than a vector
– Martin Morterol
Apr 18 at 11:49
3
you may gain some time changing one loop a bit:
for(int i = 2; i * i <n;i++)
since ifi * i >= n
then next loop does nothing.– Marek R
Apr 18 at 12:20
3
This doesn't address the question, but when you're dealing with boolean types, use
true
andfalse
and not1
. So:vector<bool> flag(n+1, true);
andif (flag[i])
. That doesn't affect the result, but it makes it much clearer what you're doing.– Pete Becker
Apr 18 at 12:53
4
If you are not already doing so, make sure to compile your code with compiler optimizations enabled before benchmarking.
– Jesper Juhl
Apr 18 at 14:22