Why have apparent memory bitflips in non-ECC memory not increased?Freezing DRAM for forensics (coldboot)How does SDRAM refresh interact with ECCWhy does flash memory have a lifespan?PIC32 does not have Memory Managment Unit, why would a high end microcontroller not have such a peripheralFlash memory data retention timeWhy use a buffer in/output for a RAM memory?Difference between a memory cell and a memory chip?Why does this RAM component have unpredictable behavior in Multisim?DDR3 Data Errors

How acceptable is an ellipsis "..." in formal mathematics?

What ways are there to bypass spell resistance?

What is the fastest way to move in Borderlands 3?

How can I remove rest of file from string for all files?

Shieldgate - Tied Village Clusters

Accounting for intervals not present within the chord ratio?

Should a grammatical article be a part of a web link anchor

Can something have more sugar per 100g than the percentage of sugar that's in it?

Is there a more efficient alternative to pull down resistors?

Sanitise a high score table

Can massive damage kill you while at 0 HP?

I am confused with the word order when putting a sentence into passé composé with reflexive verbs

one-liner vs script

Why didn't Trudy wear a breathing mask in Avatar?

How to find an internship in OR/Optimization?

Should I reveal productivity tricks to peers, or keep them to myself in order to be more productive than the others?

Reduction of carbamate with LAH

Is having your hand in your pocket during a presentation bad?

Is there such thing as plasma (from reentry) creating lift?

Why has Donald Trump's popularity remained so stable over a rather long period of time?

Is there a simple way to apply a function to the RHS of a substitution?

Had there been instances of national states banning harmful imports before the mid-19th C Opium Wars?

Relation between signal processing and control systems engineering?

Does the Creighton Method of Natural Family Planning have a failure rate of 3.2% or less?



Why have apparent memory bitflips in non-ECC memory not increased?


Freezing DRAM for forensics (coldboot)How does SDRAM refresh interact with ECCWhy does flash memory have a lifespan?PIC32 does not have Memory Managment Unit, why would a high end microcontroller not have such a peripheralFlash memory data retention timeWhy use a buffer in/output for a RAM memory?Difference between a memory cell and a memory chip?Why does this RAM component have unpredictable behavior in Multisim?DDR3 Data Errors






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;









5












$begingroup$


Back in the early 2000s I remember asking about why it was so important that servers use ECC memory. The prevailing wisdom at the time was that systems with lots of RAM would be, statistically, more likely to suffer bitflips. This makes sense - if each cell has a 10-20 probability of suffering a bitflip per second, then 109 cells have a 10-11 probability per second. The more cells you have, the higher the probability of a bitflip in any given time period.



Back then we would be looking at a ballpark of 128MB to 1GB of RAM. These days we regularly put 16GB or more in laptops and desktops, with workstations commonly having 64GB or more. For argument's sake, let's say we've increased total RAM amounts by two orders of magnitude. So we should see a hundred times or so more bitflips, on average, in any given system, assuming that nothing else changed.



The more I thought about it, though, the more I realised that the random bitflip rate should be much higher in newer systems:



  • Lower operating voltages means lower distinction between a 0 and a 1.

  • Lower gate charge means less overall energy required to flip a bit.

  • More densely packed gates increases the likelihood of being affected by cosmic rays.

  • Refresh times don't seem to have gone anywhere. DDR2 tRFC was 40-60 clocks, DDR3 tRFC was more like 90-130 clocks, and DDR4 tRFC is more like 200-450 clocks. When you divide by the internal memory clock rates to get a wall time for each refresh timing it doesn't really show much of a trend - it's effectively flat but with a higher margin either way as time goes on.

But, as far as I know, we're not seeing bitflips everywhere on non-ECC RAM, at least within the confines of our atmosphere.



So, what's the deal? Why aren't we seeing endless bitflips everywhere, at least 100x if not 10000x more frequently than two decades ago? Is ECC actually important in the context of growing RAM sizes, or do the stats not back it up? Or is there some other technology advance that is mitigating bitflip problems in non-ECC memory? I'm particularly interested in answers with authoritative references rather than speculation about error rates.










share|improve this question











$endgroup$









  • 2




    $begingroup$
    @SunnyskyguyEE75 That's irrelevant to the question and makes an assumption that the probabilistic risk of using non-ECC memory is high in the first place, which is the exact assumption that I'm challenging here. I understand probabilistic risk and impact (my career depends upon it); the question is about whether those risks are being overstated in the first place.
    $endgroup$
    – Polynomial
    Apr 17 at 13:15






  • 2




    $begingroup$
    @SunnyskyguyEE75 I'm not asking how to fix the problem, or how to model probabilistic MTBFs. Just interested in authoritative references as to why a 16GB non-ECC DIMM from 2019 seems to have the same apparent MTBF as a 128MB non-ECC DIMM from 2002, despite the conventional wisdom back then saying "if you have more than 1GB of RAM you should use ECC, because of bit flips" and there being two whole orders of magnitude between the sizes, plus all the other negative factors listed above. If you want to speculate, feel free, but that's not really what I'm looking for in an answer.
    $endgroup$
    – Polynomial
    Apr 17 at 13:29






  • 3




    $begingroup$
    To a great extent it comes down to the use case. The statistical chance of a bit flip (most commonly due to free neutrons) at sea level is really low, but non-zero, however the other part of that is that you may not notice because the bit that gets flipped may be in an area not currently being used. A study by Boeing on servers in Denver established that bit flips (SEUs in the trade) do indeed occur regularly. In avionics we are required to use ECC as the chance increases drastically with altitude.
    $endgroup$
    – Peter Smith
    Apr 17 at 13:33







  • 2




    $begingroup$
    @SunnyskyguyEE75 As I've said, I'm not looking to predict it. I work in infosec; I understand the difference between risk/probabilities and predicting actual discrete incidents. It's practically the same as radioactivity - you can model decays/sec, but you can't predict the individual decays with any certainty. I'm asking about differences between rates. "Ask Crucial" isn't a useful answer here.
    $endgroup$
    – Polynomial
    Apr 17 at 13:36






  • 3




    $begingroup$
    The only company I know of to do a long term study is Xilinx on the Rosetta project. They do publish soft error rates for all their parts. Note that internal geometries and layout (which is used to reduce the neutron cross section) play a large part in effective vulnerability.
    $endgroup$
    – Peter Smith
    Apr 17 at 13:38

















5












$begingroup$


Back in the early 2000s I remember asking about why it was so important that servers use ECC memory. The prevailing wisdom at the time was that systems with lots of RAM would be, statistically, more likely to suffer bitflips. This makes sense - if each cell has a 10-20 probability of suffering a bitflip per second, then 109 cells have a 10-11 probability per second. The more cells you have, the higher the probability of a bitflip in any given time period.



Back then we would be looking at a ballpark of 128MB to 1GB of RAM. These days we regularly put 16GB or more in laptops and desktops, with workstations commonly having 64GB or more. For argument's sake, let's say we've increased total RAM amounts by two orders of magnitude. So we should see a hundred times or so more bitflips, on average, in any given system, assuming that nothing else changed.



The more I thought about it, though, the more I realised that the random bitflip rate should be much higher in newer systems:



  • Lower operating voltages means lower distinction between a 0 and a 1.

  • Lower gate charge means less overall energy required to flip a bit.

  • More densely packed gates increases the likelihood of being affected by cosmic rays.

  • Refresh times don't seem to have gone anywhere. DDR2 tRFC was 40-60 clocks, DDR3 tRFC was more like 90-130 clocks, and DDR4 tRFC is more like 200-450 clocks. When you divide by the internal memory clock rates to get a wall time for each refresh timing it doesn't really show much of a trend - it's effectively flat but with a higher margin either way as time goes on.

But, as far as I know, we're not seeing bitflips everywhere on non-ECC RAM, at least within the confines of our atmosphere.



So, what's the deal? Why aren't we seeing endless bitflips everywhere, at least 100x if not 10000x more frequently than two decades ago? Is ECC actually important in the context of growing RAM sizes, or do the stats not back it up? Or is there some other technology advance that is mitigating bitflip problems in non-ECC memory? I'm particularly interested in answers with authoritative references rather than speculation about error rates.










share|improve this question











$endgroup$









  • 2




    $begingroup$
    @SunnyskyguyEE75 That's irrelevant to the question and makes an assumption that the probabilistic risk of using non-ECC memory is high in the first place, which is the exact assumption that I'm challenging here. I understand probabilistic risk and impact (my career depends upon it); the question is about whether those risks are being overstated in the first place.
    $endgroup$
    – Polynomial
    Apr 17 at 13:15






  • 2




    $begingroup$
    @SunnyskyguyEE75 I'm not asking how to fix the problem, or how to model probabilistic MTBFs. Just interested in authoritative references as to why a 16GB non-ECC DIMM from 2019 seems to have the same apparent MTBF as a 128MB non-ECC DIMM from 2002, despite the conventional wisdom back then saying "if you have more than 1GB of RAM you should use ECC, because of bit flips" and there being two whole orders of magnitude between the sizes, plus all the other negative factors listed above. If you want to speculate, feel free, but that's not really what I'm looking for in an answer.
    $endgroup$
    – Polynomial
    Apr 17 at 13:29






  • 3




    $begingroup$
    To a great extent it comes down to the use case. The statistical chance of a bit flip (most commonly due to free neutrons) at sea level is really low, but non-zero, however the other part of that is that you may not notice because the bit that gets flipped may be in an area not currently being used. A study by Boeing on servers in Denver established that bit flips (SEUs in the trade) do indeed occur regularly. In avionics we are required to use ECC as the chance increases drastically with altitude.
    $endgroup$
    – Peter Smith
    Apr 17 at 13:33







  • 2




    $begingroup$
    @SunnyskyguyEE75 As I've said, I'm not looking to predict it. I work in infosec; I understand the difference between risk/probabilities and predicting actual discrete incidents. It's practically the same as radioactivity - you can model decays/sec, but you can't predict the individual decays with any certainty. I'm asking about differences between rates. "Ask Crucial" isn't a useful answer here.
    $endgroup$
    – Polynomial
    Apr 17 at 13:36






  • 3




    $begingroup$
    The only company I know of to do a long term study is Xilinx on the Rosetta project. They do publish soft error rates for all their parts. Note that internal geometries and layout (which is used to reduce the neutron cross section) play a large part in effective vulnerability.
    $endgroup$
    – Peter Smith
    Apr 17 at 13:38













5












5








5


1



$begingroup$


Back in the early 2000s I remember asking about why it was so important that servers use ECC memory. The prevailing wisdom at the time was that systems with lots of RAM would be, statistically, more likely to suffer bitflips. This makes sense - if each cell has a 10-20 probability of suffering a bitflip per second, then 109 cells have a 10-11 probability per second. The more cells you have, the higher the probability of a bitflip in any given time period.



Back then we would be looking at a ballpark of 128MB to 1GB of RAM. These days we regularly put 16GB or more in laptops and desktops, with workstations commonly having 64GB or more. For argument's sake, let's say we've increased total RAM amounts by two orders of magnitude. So we should see a hundred times or so more bitflips, on average, in any given system, assuming that nothing else changed.



The more I thought about it, though, the more I realised that the random bitflip rate should be much higher in newer systems:



  • Lower operating voltages means lower distinction between a 0 and a 1.

  • Lower gate charge means less overall energy required to flip a bit.

  • More densely packed gates increases the likelihood of being affected by cosmic rays.

  • Refresh times don't seem to have gone anywhere. DDR2 tRFC was 40-60 clocks, DDR3 tRFC was more like 90-130 clocks, and DDR4 tRFC is more like 200-450 clocks. When you divide by the internal memory clock rates to get a wall time for each refresh timing it doesn't really show much of a trend - it's effectively flat but with a higher margin either way as time goes on.

But, as far as I know, we're not seeing bitflips everywhere on non-ECC RAM, at least within the confines of our atmosphere.



So, what's the deal? Why aren't we seeing endless bitflips everywhere, at least 100x if not 10000x more frequently than two decades ago? Is ECC actually important in the context of growing RAM sizes, or do the stats not back it up? Or is there some other technology advance that is mitigating bitflip problems in non-ECC memory? I'm particularly interested in answers with authoritative references rather than speculation about error rates.










share|improve this question











$endgroup$




Back in the early 2000s I remember asking about why it was so important that servers use ECC memory. The prevailing wisdom at the time was that systems with lots of RAM would be, statistically, more likely to suffer bitflips. This makes sense - if each cell has a 10-20 probability of suffering a bitflip per second, then 109 cells have a 10-11 probability per second. The more cells you have, the higher the probability of a bitflip in any given time period.



Back then we would be looking at a ballpark of 128MB to 1GB of RAM. These days we regularly put 16GB or more in laptops and desktops, with workstations commonly having 64GB or more. For argument's sake, let's say we've increased total RAM amounts by two orders of magnitude. So we should see a hundred times or so more bitflips, on average, in any given system, assuming that nothing else changed.



The more I thought about it, though, the more I realised that the random bitflip rate should be much higher in newer systems:



  • Lower operating voltages means lower distinction between a 0 and a 1.

  • Lower gate charge means less overall energy required to flip a bit.

  • More densely packed gates increases the likelihood of being affected by cosmic rays.

  • Refresh times don't seem to have gone anywhere. DDR2 tRFC was 40-60 clocks, DDR3 tRFC was more like 90-130 clocks, and DDR4 tRFC is more like 200-450 clocks. When you divide by the internal memory clock rates to get a wall time for each refresh timing it doesn't really show much of a trend - it's effectively flat but with a higher margin either way as time goes on.

But, as far as I know, we're not seeing bitflips everywhere on non-ECC RAM, at least within the confines of our atmosphere.



So, what's the deal? Why aren't we seeing endless bitflips everywhere, at least 100x if not 10000x more frequently than two decades ago? Is ECC actually important in the context of growing RAM sizes, or do the stats not back it up? Or is there some other technology advance that is mitigating bitflip problems in non-ECC memory? I'm particularly interested in answers with authoritative references rather than speculation about error rates.







memory ram error error-correction ecc






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 17 at 13:46







Polynomial

















asked Apr 17 at 13:03









PolynomialPolynomial

3,0404 gold badges31 silver badges59 bronze badges




3,0404 gold badges31 silver badges59 bronze badges










  • 2




    $begingroup$
    @SunnyskyguyEE75 That's irrelevant to the question and makes an assumption that the probabilistic risk of using non-ECC memory is high in the first place, which is the exact assumption that I'm challenging here. I understand probabilistic risk and impact (my career depends upon it); the question is about whether those risks are being overstated in the first place.
    $endgroup$
    – Polynomial
    Apr 17 at 13:15






  • 2




    $begingroup$
    @SunnyskyguyEE75 I'm not asking how to fix the problem, or how to model probabilistic MTBFs. Just interested in authoritative references as to why a 16GB non-ECC DIMM from 2019 seems to have the same apparent MTBF as a 128MB non-ECC DIMM from 2002, despite the conventional wisdom back then saying "if you have more than 1GB of RAM you should use ECC, because of bit flips" and there being two whole orders of magnitude between the sizes, plus all the other negative factors listed above. If you want to speculate, feel free, but that's not really what I'm looking for in an answer.
    $endgroup$
    – Polynomial
    Apr 17 at 13:29






  • 3




    $begingroup$
    To a great extent it comes down to the use case. The statistical chance of a bit flip (most commonly due to free neutrons) at sea level is really low, but non-zero, however the other part of that is that you may not notice because the bit that gets flipped may be in an area not currently being used. A study by Boeing on servers in Denver established that bit flips (SEUs in the trade) do indeed occur regularly. In avionics we are required to use ECC as the chance increases drastically with altitude.
    $endgroup$
    – Peter Smith
    Apr 17 at 13:33







  • 2




    $begingroup$
    @SunnyskyguyEE75 As I've said, I'm not looking to predict it. I work in infosec; I understand the difference between risk/probabilities and predicting actual discrete incidents. It's practically the same as radioactivity - you can model decays/sec, but you can't predict the individual decays with any certainty. I'm asking about differences between rates. "Ask Crucial" isn't a useful answer here.
    $endgroup$
    – Polynomial
    Apr 17 at 13:36






  • 3




    $begingroup$
    The only company I know of to do a long term study is Xilinx on the Rosetta project. They do publish soft error rates for all their parts. Note that internal geometries and layout (which is used to reduce the neutron cross section) play a large part in effective vulnerability.
    $endgroup$
    – Peter Smith
    Apr 17 at 13:38












  • 2




    $begingroup$
    @SunnyskyguyEE75 That's irrelevant to the question and makes an assumption that the probabilistic risk of using non-ECC memory is high in the first place, which is the exact assumption that I'm challenging here. I understand probabilistic risk and impact (my career depends upon it); the question is about whether those risks are being overstated in the first place.
    $endgroup$
    – Polynomial
    Apr 17 at 13:15






  • 2




    $begingroup$
    @SunnyskyguyEE75 I'm not asking how to fix the problem, or how to model probabilistic MTBFs. Just interested in authoritative references as to why a 16GB non-ECC DIMM from 2019 seems to have the same apparent MTBF as a 128MB non-ECC DIMM from 2002, despite the conventional wisdom back then saying "if you have more than 1GB of RAM you should use ECC, because of bit flips" and there being two whole orders of magnitude between the sizes, plus all the other negative factors listed above. If you want to speculate, feel free, but that's not really what I'm looking for in an answer.
    $endgroup$
    – Polynomial
    Apr 17 at 13:29






  • 3




    $begingroup$
    To a great extent it comes down to the use case. The statistical chance of a bit flip (most commonly due to free neutrons) at sea level is really low, but non-zero, however the other part of that is that you may not notice because the bit that gets flipped may be in an area not currently being used. A study by Boeing on servers in Denver established that bit flips (SEUs in the trade) do indeed occur regularly. In avionics we are required to use ECC as the chance increases drastically with altitude.
    $endgroup$
    – Peter Smith
    Apr 17 at 13:33







  • 2




    $begingroup$
    @SunnyskyguyEE75 As I've said, I'm not looking to predict it. I work in infosec; I understand the difference between risk/probabilities and predicting actual discrete incidents. It's practically the same as radioactivity - you can model decays/sec, but you can't predict the individual decays with any certainty. I'm asking about differences between rates. "Ask Crucial" isn't a useful answer here.
    $endgroup$
    – Polynomial
    Apr 17 at 13:36






  • 3




    $begingroup$
    The only company I know of to do a long term study is Xilinx on the Rosetta project. They do publish soft error rates for all their parts. Note that internal geometries and layout (which is used to reduce the neutron cross section) play a large part in effective vulnerability.
    $endgroup$
    – Peter Smith
    Apr 17 at 13:38







2




2




$begingroup$
@SunnyskyguyEE75 That's irrelevant to the question and makes an assumption that the probabilistic risk of using non-ECC memory is high in the first place, which is the exact assumption that I'm challenging here. I understand probabilistic risk and impact (my career depends upon it); the question is about whether those risks are being overstated in the first place.
$endgroup$
– Polynomial
Apr 17 at 13:15




$begingroup$
@SunnyskyguyEE75 That's irrelevant to the question and makes an assumption that the probabilistic risk of using non-ECC memory is high in the first place, which is the exact assumption that I'm challenging here. I understand probabilistic risk and impact (my career depends upon it); the question is about whether those risks are being overstated in the first place.
$endgroup$
– Polynomial
Apr 17 at 13:15




2




2




$begingroup$
@SunnyskyguyEE75 I'm not asking how to fix the problem, or how to model probabilistic MTBFs. Just interested in authoritative references as to why a 16GB non-ECC DIMM from 2019 seems to have the same apparent MTBF as a 128MB non-ECC DIMM from 2002, despite the conventional wisdom back then saying "if you have more than 1GB of RAM you should use ECC, because of bit flips" and there being two whole orders of magnitude between the sizes, plus all the other negative factors listed above. If you want to speculate, feel free, but that's not really what I'm looking for in an answer.
$endgroup$
– Polynomial
Apr 17 at 13:29




$begingroup$
@SunnyskyguyEE75 I'm not asking how to fix the problem, or how to model probabilistic MTBFs. Just interested in authoritative references as to why a 16GB non-ECC DIMM from 2019 seems to have the same apparent MTBF as a 128MB non-ECC DIMM from 2002, despite the conventional wisdom back then saying "if you have more than 1GB of RAM you should use ECC, because of bit flips" and there being two whole orders of magnitude between the sizes, plus all the other negative factors listed above. If you want to speculate, feel free, but that's not really what I'm looking for in an answer.
$endgroup$
– Polynomial
Apr 17 at 13:29




3




3




$begingroup$
To a great extent it comes down to the use case. The statistical chance of a bit flip (most commonly due to free neutrons) at sea level is really low, but non-zero, however the other part of that is that you may not notice because the bit that gets flipped may be in an area not currently being used. A study by Boeing on servers in Denver established that bit flips (SEUs in the trade) do indeed occur regularly. In avionics we are required to use ECC as the chance increases drastically with altitude.
$endgroup$
– Peter Smith
Apr 17 at 13:33





$begingroup$
To a great extent it comes down to the use case. The statistical chance of a bit flip (most commonly due to free neutrons) at sea level is really low, but non-zero, however the other part of that is that you may not notice because the bit that gets flipped may be in an area not currently being used. A study by Boeing on servers in Denver established that bit flips (SEUs in the trade) do indeed occur regularly. In avionics we are required to use ECC as the chance increases drastically with altitude.
$endgroup$
– Peter Smith
Apr 17 at 13:33





2




2




$begingroup$
@SunnyskyguyEE75 As I've said, I'm not looking to predict it. I work in infosec; I understand the difference between risk/probabilities and predicting actual discrete incidents. It's practically the same as radioactivity - you can model decays/sec, but you can't predict the individual decays with any certainty. I'm asking about differences between rates. "Ask Crucial" isn't a useful answer here.
$endgroup$
– Polynomial
Apr 17 at 13:36




$begingroup$
@SunnyskyguyEE75 As I've said, I'm not looking to predict it. I work in infosec; I understand the difference between risk/probabilities and predicting actual discrete incidents. It's practically the same as radioactivity - you can model decays/sec, but you can't predict the individual decays with any certainty. I'm asking about differences between rates. "Ask Crucial" isn't a useful answer here.
$endgroup$
– Polynomial
Apr 17 at 13:36




3




3




$begingroup$
The only company I know of to do a long term study is Xilinx on the Rosetta project. They do publish soft error rates for all their parts. Note that internal geometries and layout (which is used to reduce the neutron cross section) play a large part in effective vulnerability.
$endgroup$
– Peter Smith
Apr 17 at 13:38




$begingroup$
The only company I know of to do a long term study is Xilinx on the Rosetta project. They do publish soft error rates for all their parts. Note that internal geometries and layout (which is used to reduce the neutron cross section) play a large part in effective vulnerability.
$endgroup$
– Peter Smith
Apr 17 at 13:38










2 Answers
2






active

oldest

votes


















4














$begingroup$

Single-event upsets (SEU) at sea level tend to be caused either by radioactive contaminants in the IC manufacturing materials (particularly the metals) generating alpha particles or by high-energy neutrons (caused by cosmic rays in the atmosphere) ionizing atoms in the silicon itself.



Over the years, manufacturers have greatly reduced the threat caused by radioactive contaminants. There are also proprietary approaches to cell layout that can help to mitigate the risk of SEU. All of this is probably going to be trade secrets and not public information.



And, no, I'm not going to do your literature search for you. However, I recommend that you go through the IEEE Transactions on Nuclear Science if you want references.






share|improve this answer









$endgroup$














  • $begingroup$
    Are saying these are the only significant causes of correctable errors ? Or uncorrectable errors or both or just limited to one cause of the SEU
    $endgroup$
    – Sunnyskyguy EE75
    Apr 17 at 15:04










  • $begingroup$
    Building on this, for the cosmic ray induced SEU case, the probability of getting hit with a cosmic ray correlates to die area, not number of bits within that die, and in general, memory die area has actually been going down over the years.
    $endgroup$
    – Nate Strickland
    Apr 17 at 16:56


















2














$begingroup$

The answer is, more work needs to be done, and they aren't sure:




The results show that the radiation susceptibility has actually
improved somewhat for devices that have advanced to the 0.13μm level,
which contradicts earlier predictions. This trend is encouraging, but
it may not necessarily continue for devices that are scaled below 0.1
μm. It is important to note that the recent computer modeling
calculations for SEE susceptibility of scaled devices predicts a large
increase in collected charge that is directly in conflict with test
results for heavy ions and neutron soft errors.

Source: The Effect of Device Scaling on
Single-Event Effects in Advance
CMOS Devices




enter image description here

Source: The Effect of Device Scaling on
Single-Event Effects in Advance
CMOS Devices



The actual mechanism for SEE's/SEU's has more to do with voltage and geometry than size. The one would think that smaller size and smaller charge per memory element would make memory elements easier to flip and cause errors, but the effect is small and is more related to voltage and geometry. Which is good for space applications that rely increasingly on commercial technology (like cubesats).






share|improve this answer









$endgroup$
















    Your Answer






    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("schematics", function ()
    StackExchange.schematics.init();
    );
    , "cicuitlab");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "135"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );














    draft saved

    draft discarded
















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2felectronics.stackexchange.com%2fquestions%2f433058%2fwhy-have-apparent-memory-bitflips-in-non-ecc-memory-not-increased%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    4














    $begingroup$

    Single-event upsets (SEU) at sea level tend to be caused either by radioactive contaminants in the IC manufacturing materials (particularly the metals) generating alpha particles or by high-energy neutrons (caused by cosmic rays in the atmosphere) ionizing atoms in the silicon itself.



    Over the years, manufacturers have greatly reduced the threat caused by radioactive contaminants. There are also proprietary approaches to cell layout that can help to mitigate the risk of SEU. All of this is probably going to be trade secrets and not public information.



    And, no, I'm not going to do your literature search for you. However, I recommend that you go through the IEEE Transactions on Nuclear Science if you want references.






    share|improve this answer









    $endgroup$














    • $begingroup$
      Are saying these are the only significant causes of correctable errors ? Or uncorrectable errors or both or just limited to one cause of the SEU
      $endgroup$
      – Sunnyskyguy EE75
      Apr 17 at 15:04










    • $begingroup$
      Building on this, for the cosmic ray induced SEU case, the probability of getting hit with a cosmic ray correlates to die area, not number of bits within that die, and in general, memory die area has actually been going down over the years.
      $endgroup$
      – Nate Strickland
      Apr 17 at 16:56















    4














    $begingroup$

    Single-event upsets (SEU) at sea level tend to be caused either by radioactive contaminants in the IC manufacturing materials (particularly the metals) generating alpha particles or by high-energy neutrons (caused by cosmic rays in the atmosphere) ionizing atoms in the silicon itself.



    Over the years, manufacturers have greatly reduced the threat caused by radioactive contaminants. There are also proprietary approaches to cell layout that can help to mitigate the risk of SEU. All of this is probably going to be trade secrets and not public information.



    And, no, I'm not going to do your literature search for you. However, I recommend that you go through the IEEE Transactions on Nuclear Science if you want references.






    share|improve this answer









    $endgroup$














    • $begingroup$
      Are saying these are the only significant causes of correctable errors ? Or uncorrectable errors or both or just limited to one cause of the SEU
      $endgroup$
      – Sunnyskyguy EE75
      Apr 17 at 15:04










    • $begingroup$
      Building on this, for the cosmic ray induced SEU case, the probability of getting hit with a cosmic ray correlates to die area, not number of bits within that die, and in general, memory die area has actually been going down over the years.
      $endgroup$
      – Nate Strickland
      Apr 17 at 16:56













    4














    4










    4







    $begingroup$

    Single-event upsets (SEU) at sea level tend to be caused either by radioactive contaminants in the IC manufacturing materials (particularly the metals) generating alpha particles or by high-energy neutrons (caused by cosmic rays in the atmosphere) ionizing atoms in the silicon itself.



    Over the years, manufacturers have greatly reduced the threat caused by radioactive contaminants. There are also proprietary approaches to cell layout that can help to mitigate the risk of SEU. All of this is probably going to be trade secrets and not public information.



    And, no, I'm not going to do your literature search for you. However, I recommend that you go through the IEEE Transactions on Nuclear Science if you want references.






    share|improve this answer









    $endgroup$



    Single-event upsets (SEU) at sea level tend to be caused either by radioactive contaminants in the IC manufacturing materials (particularly the metals) generating alpha particles or by high-energy neutrons (caused by cosmic rays in the atmosphere) ionizing atoms in the silicon itself.



    Over the years, manufacturers have greatly reduced the threat caused by radioactive contaminants. There are also proprietary approaches to cell layout that can help to mitigate the risk of SEU. All of this is probably going to be trade secrets and not public information.



    And, no, I'm not going to do your literature search for you. However, I recommend that you go through the IEEE Transactions on Nuclear Science if you want references.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Apr 17 at 14:11









    Elliot AldersonElliot Alderson

    12.4k2 gold badges12 silver badges25 bronze badges




    12.4k2 gold badges12 silver badges25 bronze badges














    • $begingroup$
      Are saying these are the only significant causes of correctable errors ? Or uncorrectable errors or both or just limited to one cause of the SEU
      $endgroup$
      – Sunnyskyguy EE75
      Apr 17 at 15:04










    • $begingroup$
      Building on this, for the cosmic ray induced SEU case, the probability of getting hit with a cosmic ray correlates to die area, not number of bits within that die, and in general, memory die area has actually been going down over the years.
      $endgroup$
      – Nate Strickland
      Apr 17 at 16:56
















    • $begingroup$
      Are saying these are the only significant causes of correctable errors ? Or uncorrectable errors or both or just limited to one cause of the SEU
      $endgroup$
      – Sunnyskyguy EE75
      Apr 17 at 15:04










    • $begingroup$
      Building on this, for the cosmic ray induced SEU case, the probability of getting hit with a cosmic ray correlates to die area, not number of bits within that die, and in general, memory die area has actually been going down over the years.
      $endgroup$
      – Nate Strickland
      Apr 17 at 16:56















    $begingroup$
    Are saying these are the only significant causes of correctable errors ? Or uncorrectable errors or both or just limited to one cause of the SEU
    $endgroup$
    – Sunnyskyguy EE75
    Apr 17 at 15:04




    $begingroup$
    Are saying these are the only significant causes of correctable errors ? Or uncorrectable errors or both or just limited to one cause of the SEU
    $endgroup$
    – Sunnyskyguy EE75
    Apr 17 at 15:04












    $begingroup$
    Building on this, for the cosmic ray induced SEU case, the probability of getting hit with a cosmic ray correlates to die area, not number of bits within that die, and in general, memory die area has actually been going down over the years.
    $endgroup$
    – Nate Strickland
    Apr 17 at 16:56




    $begingroup$
    Building on this, for the cosmic ray induced SEU case, the probability of getting hit with a cosmic ray correlates to die area, not number of bits within that die, and in general, memory die area has actually been going down over the years.
    $endgroup$
    – Nate Strickland
    Apr 17 at 16:56













    2














    $begingroup$

    The answer is, more work needs to be done, and they aren't sure:




    The results show that the radiation susceptibility has actually
    improved somewhat for devices that have advanced to the 0.13μm level,
    which contradicts earlier predictions. This trend is encouraging, but
    it may not necessarily continue for devices that are scaled below 0.1
    μm. It is important to note that the recent computer modeling
    calculations for SEE susceptibility of scaled devices predicts a large
    increase in collected charge that is directly in conflict with test
    results for heavy ions and neutron soft errors.

    Source: The Effect of Device Scaling on
    Single-Event Effects in Advance
    CMOS Devices




    enter image description here

    Source: The Effect of Device Scaling on
    Single-Event Effects in Advance
    CMOS Devices



    The actual mechanism for SEE's/SEU's has more to do with voltage and geometry than size. The one would think that smaller size and smaller charge per memory element would make memory elements easier to flip and cause errors, but the effect is small and is more related to voltage and geometry. Which is good for space applications that rely increasingly on commercial technology (like cubesats).






    share|improve this answer









    $endgroup$



















      2














      $begingroup$

      The answer is, more work needs to be done, and they aren't sure:




      The results show that the radiation susceptibility has actually
      improved somewhat for devices that have advanced to the 0.13μm level,
      which contradicts earlier predictions. This trend is encouraging, but
      it may not necessarily continue for devices that are scaled below 0.1
      μm. It is important to note that the recent computer modeling
      calculations for SEE susceptibility of scaled devices predicts a large
      increase in collected charge that is directly in conflict with test
      results for heavy ions and neutron soft errors.

      Source: The Effect of Device Scaling on
      Single-Event Effects in Advance
      CMOS Devices




      enter image description here

      Source: The Effect of Device Scaling on
      Single-Event Effects in Advance
      CMOS Devices



      The actual mechanism for SEE's/SEU's has more to do with voltage and geometry than size. The one would think that smaller size and smaller charge per memory element would make memory elements easier to flip and cause errors, but the effect is small and is more related to voltage and geometry. Which is good for space applications that rely increasingly on commercial technology (like cubesats).






      share|improve this answer









      $endgroup$

















        2














        2










        2







        $begingroup$

        The answer is, more work needs to be done, and they aren't sure:




        The results show that the radiation susceptibility has actually
        improved somewhat for devices that have advanced to the 0.13μm level,
        which contradicts earlier predictions. This trend is encouraging, but
        it may not necessarily continue for devices that are scaled below 0.1
        μm. It is important to note that the recent computer modeling
        calculations for SEE susceptibility of scaled devices predicts a large
        increase in collected charge that is directly in conflict with test
        results for heavy ions and neutron soft errors.

        Source: The Effect of Device Scaling on
        Single-Event Effects in Advance
        CMOS Devices




        enter image description here

        Source: The Effect of Device Scaling on
        Single-Event Effects in Advance
        CMOS Devices



        The actual mechanism for SEE's/SEU's has more to do with voltage and geometry than size. The one would think that smaller size and smaller charge per memory element would make memory elements easier to flip and cause errors, but the effect is small and is more related to voltage and geometry. Which is good for space applications that rely increasingly on commercial technology (like cubesats).






        share|improve this answer









        $endgroup$



        The answer is, more work needs to be done, and they aren't sure:




        The results show that the radiation susceptibility has actually
        improved somewhat for devices that have advanced to the 0.13μm level,
        which contradicts earlier predictions. This trend is encouraging, but
        it may not necessarily continue for devices that are scaled below 0.1
        μm. It is important to note that the recent computer modeling
        calculations for SEE susceptibility of scaled devices predicts a large
        increase in collected charge that is directly in conflict with test
        results for heavy ions and neutron soft errors.

        Source: The Effect of Device Scaling on
        Single-Event Effects in Advance
        CMOS Devices




        enter image description here

        Source: The Effect of Device Scaling on
        Single-Event Effects in Advance
        CMOS Devices



        The actual mechanism for SEE's/SEU's has more to do with voltage and geometry than size. The one would think that smaller size and smaller charge per memory element would make memory elements easier to flip and cause errors, but the effect is small and is more related to voltage and geometry. Which is good for space applications that rely increasingly on commercial technology (like cubesats).







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Apr 17 at 17:23









        Voltage SpikeVoltage Spike

        39.7k12 gold badges44 silver badges116 bronze badges




        39.7k12 gold badges44 silver badges116 bronze badges































            draft saved

            draft discarded















































            Thanks for contributing an answer to Electrical Engineering Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2felectronics.stackexchange.com%2fquestions%2f433058%2fwhy-have-apparent-memory-bitflips-in-non-ecc-memory-not-increased%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Distance measures on a map of a game The 2019 Stack Overflow Developer Survey Results Are Inmin distance in a graphShortest distance path on contour plotHow to plot a tilted map?Finding points outside of a diskDelaunay link distanceAnnulus from GeoDisks: drawing a ring on a mapNegative Correlation DistanceFind distance along a path (GPS coordinates)Finding position at given distance in a GeoPathMathematics behind distance estimation using camera

            How to get a smooth, uniform ParametricPlot of a 2D Region?How to plot a complicated Region?How to exclude a region from ParametricPlotHow discretize a region placing vertices on a specific non-uniform gridHow to transform a Plot or a ParametricPlot into a RegionHow can I get a smooth plot of a bounded region?Smooth ParametricPlot3D with RegionFunction?Smooth border of a region ParametricPlotSmooth region boundarySmooth region plot from list of pointsGet minimum y of a certain x in a region

            Genealogie vun de Merowenger Vum Merowech bis zum Chilperich I. | Navigatiounsmenü