Extrapolation v. InterpolationWhat is wrong with extrapolation?Fourier/trigonometric interpolationUncertainty propagation in linear interpolationInterpolating binned data such that bin average is preservedWhat are the theoretical reasons for why extrapolation “less reliable” than interpolation?Difference between extrapolation and interpolation in higher dimensions

Parsing C-like code to extract info

What does the 1.-5. notation mean at the top right corner?

Iodine tablet correct use and efficiency

What happened to Sophie in her last encounter with Arthur?

What's an "add" chord?

How do the Martian rebels defeat Earth when they're grossly outnumbered and outgunned?

Using footnotes in fiction: children's book which can be enjoyed by adults

Extract all keys from an object (json)

How much money would I need to feel secure in my job?

Photographic Companions

Unexpected Code Coverage Reduction

Identify the Eeveelutions

How to present boolean options along with selecting exactly 1 of them as "primary"?

CO₂ level is high enough that it reduces cognitive ability. Isn't that a reason to worry?

Is current (November 2019) polling about Democrats lead over Trump trustworthy?

What to do with a bent but not broken aluminum seat stay

How to manage publications on local computer

What DC should I use for someone trying to survive indefinitely solely with an alchemy jug as their only source of food and water? (survival campaign)

Theme or Topic - what's the difference?

I have stack-exchanged through my undergrad math program. Am I likely to succeed in mathematics PhD programs?

Is there any point in adding more than 6 months' runway in savings instead of investing everything after that?

How to identify the default sender IP address?

Can a 'Second Referendum' on Brexit contain three options?

Does any country have free college & open admissions?



Extrapolation v. Interpolation


What is wrong with extrapolation?Fourier/trigonometric interpolationUncertainty propagation in linear interpolationInterpolating binned data such that bin average is preservedWhat are the theoretical reasons for why extrapolation “less reliable” than interpolation?Difference between extrapolation and interpolation in higher dimensions






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;









28














$begingroup$


What is the difference between extrapolation and interpolation, and what is the most precise way of using these terms?



For example, I have seen a statement in a paper using interpolation as:




"The procedure interpolates the shape of the estimated function between the bin points"




A sentence that uses both extrapolation and interpolation is, for example:




The previous step where we extrapolated the interpolated function using the Kernel method to the left and right temperature tails.




Can someone provide a clear and easy way to distinguish them and guide how to use these terms correctly with an example?










share|cite|improve this question












$endgroup$










  • 1




    $begingroup$
    A related question.
    $endgroup$
    – J. M. is not a statistician
    Jul 25 at 7:18






  • 1




    $begingroup$
    Possible duplicate of What is wrong with extrapolation?
    $endgroup$
    – usεr11852 says Reinstate Monic
    Jul 25 at 8:21










  • $begingroup$
    @usεr11852 I think the two questions cover similar ground but are different because this one asks for the contrast with interpolation.
    $endgroup$
    – mkt - Reinstate Monica
    Jul 25 at 11:55










  • $begingroup$
    Has this distinction between interpolation and extrapolation been formalized rigorously in a generally agreed upon way, (e.g., via convex hulls) or are these terms still subject to human judgement and interpretation?
    $endgroup$
    – Nick Alger
    Jul 25 at 16:19

















28














$begingroup$


What is the difference between extrapolation and interpolation, and what is the most precise way of using these terms?



For example, I have seen a statement in a paper using interpolation as:




"The procedure interpolates the shape of the estimated function between the bin points"




A sentence that uses both extrapolation and interpolation is, for example:




The previous step where we extrapolated the interpolated function using the Kernel method to the left and right temperature tails.




Can someone provide a clear and easy way to distinguish them and guide how to use these terms correctly with an example?










share|cite|improve this question












$endgroup$










  • 1




    $begingroup$
    A related question.
    $endgroup$
    – J. M. is not a statistician
    Jul 25 at 7:18






  • 1




    $begingroup$
    Possible duplicate of What is wrong with extrapolation?
    $endgroup$
    – usεr11852 says Reinstate Monic
    Jul 25 at 8:21










  • $begingroup$
    @usεr11852 I think the two questions cover similar ground but are different because this one asks for the contrast with interpolation.
    $endgroup$
    – mkt - Reinstate Monica
    Jul 25 at 11:55










  • $begingroup$
    Has this distinction between interpolation and extrapolation been formalized rigorously in a generally agreed upon way, (e.g., via convex hulls) or are these terms still subject to human judgement and interpretation?
    $endgroup$
    – Nick Alger
    Jul 25 at 16:19













28












28








28


12



$begingroup$


What is the difference between extrapolation and interpolation, and what is the most precise way of using these terms?



For example, I have seen a statement in a paper using interpolation as:




"The procedure interpolates the shape of the estimated function between the bin points"




A sentence that uses both extrapolation and interpolation is, for example:




The previous step where we extrapolated the interpolated function using the Kernel method to the left and right temperature tails.




Can someone provide a clear and easy way to distinguish them and guide how to use these terms correctly with an example?










share|cite|improve this question












$endgroup$




What is the difference between extrapolation and interpolation, and what is the most precise way of using these terms?



For example, I have seen a statement in a paper using interpolation as:




"The procedure interpolates the shape of the estimated function between the bin points"




A sentence that uses both extrapolation and interpolation is, for example:




The previous step where we extrapolated the interpolated function using the Kernel method to the left and right temperature tails.




Can someone provide a clear and easy way to distinguish them and guide how to use these terms correctly with an example?







terminology interpolation extrapolation






share|cite|improve this question
















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Jul 23 at 15:20









S. Kolassa - Reinstate Monica

58.2k10 gold badges113 silver badges215 bronze badges




58.2k10 gold badges113 silver badges215 bronze badges










asked Jul 23 at 15:13









Frank SwantonFrank Swanton

2703 silver badges7 bronze badges




2703 silver badges7 bronze badges










  • 1




    $begingroup$
    A related question.
    $endgroup$
    – J. M. is not a statistician
    Jul 25 at 7:18






  • 1




    $begingroup$
    Possible duplicate of What is wrong with extrapolation?
    $endgroup$
    – usεr11852 says Reinstate Monic
    Jul 25 at 8:21










  • $begingroup$
    @usεr11852 I think the two questions cover similar ground but are different because this one asks for the contrast with interpolation.
    $endgroup$
    – mkt - Reinstate Monica
    Jul 25 at 11:55










  • $begingroup$
    Has this distinction between interpolation and extrapolation been formalized rigorously in a generally agreed upon way, (e.g., via convex hulls) or are these terms still subject to human judgement and interpretation?
    $endgroup$
    – Nick Alger
    Jul 25 at 16:19












  • 1




    $begingroup$
    A related question.
    $endgroup$
    – J. M. is not a statistician
    Jul 25 at 7:18






  • 1




    $begingroup$
    Possible duplicate of What is wrong with extrapolation?
    $endgroup$
    – usεr11852 says Reinstate Monic
    Jul 25 at 8:21










  • $begingroup$
    @usεr11852 I think the two questions cover similar ground but are different because this one asks for the contrast with interpolation.
    $endgroup$
    – mkt - Reinstate Monica
    Jul 25 at 11:55










  • $begingroup$
    Has this distinction between interpolation and extrapolation been formalized rigorously in a generally agreed upon way, (e.g., via convex hulls) or are these terms still subject to human judgement and interpretation?
    $endgroup$
    – Nick Alger
    Jul 25 at 16:19







1




1




$begingroup$
A related question.
$endgroup$
– J. M. is not a statistician
Jul 25 at 7:18




$begingroup$
A related question.
$endgroup$
– J. M. is not a statistician
Jul 25 at 7:18




1




1




$begingroup$
Possible duplicate of What is wrong with extrapolation?
$endgroup$
– usεr11852 says Reinstate Monic
Jul 25 at 8:21




$begingroup$
Possible duplicate of What is wrong with extrapolation?
$endgroup$
– usεr11852 says Reinstate Monic
Jul 25 at 8:21












$begingroup$
@usεr11852 I think the two questions cover similar ground but are different because this one asks for the contrast with interpolation.
$endgroup$
– mkt - Reinstate Monica
Jul 25 at 11:55




$begingroup$
@usεr11852 I think the two questions cover similar ground but are different because this one asks for the contrast with interpolation.
$endgroup$
– mkt - Reinstate Monica
Jul 25 at 11:55












$begingroup$
Has this distinction between interpolation and extrapolation been formalized rigorously in a generally agreed upon way, (e.g., via convex hulls) or are these terms still subject to human judgement and interpretation?
$endgroup$
– Nick Alger
Jul 25 at 16:19




$begingroup$
Has this distinction between interpolation and extrapolation been formalized rigorously in a generally agreed upon way, (e.g., via convex hulls) or are these terms still subject to human judgement and interpretation?
$endgroup$
– Nick Alger
Jul 25 at 16:19










4 Answers
4






active

oldest

votes


















49
















$begingroup$

To add a visual explanation to this: let's consider a few points that you plan to model.



enter image description here



They look like they could be described well with a straight line, so you fit a linear regression to them:



enter image description here



This regression line lets you both interpolate (generate expected values in between your data points) and extrapolate (generate expected values outside the range of your data points). I've highlighted the extrapolation in red and the biggest region of interpolation in blue. To be clear, even the tiny regions between the points are interpolated, but I'm only highlighting the big one here.



enter image description here



Why is extrapolation generally more of a concern? Because you're usually much less certain about the shape of the relationship outside the range of your data. Consider what might happen when you collect a few more data points (hollow circles):



enter image description here



It turns out that the relationship was not captured well with your hypothesized relationship after all. The predictions in the extrapolated region are way off. Even if you had guessed the precise function that describes this nonlinear relationship correctly, your data did not extend over enough of a range for you to capture the nonlinearity well, so you may still have been pretty far off. Note that this is a problem not just for linear regression, but for any relationship at all - this is why extrapolation is considered dangerous.



Predictions in the interpolated region are also incorrect because of the lack of nonlinearity in the fit, but their prediction error is much lower. There's no guarantee that you won't have an unexpected relationship in between your points (i.e. the region of interpolation), but it's generally less likely.




I will add that extrapolation is not always a terrible idea - if you extrapolate a tiny bit outside the range of your data, you're probably not going to be very wrong (though it is possible!). Ancients who had no good scientific model of the world would not have been far wrong if they forecast that the sun would rise again the next day and the day after that (though one day far into the future, even this will fail).



And sometimes, extrapolation can even be informative - for example, simple short-term extrapolations of the exponential increase in atmospheric CO$_2$ have been reasonably accurate over the past few decades. If you were a student who didn't have scientific expertise but wanted a rough, short-term forecast, this would have given you fairly reasonable results. But the farther away from your data you extrapolate, the more likely your prediction is likely to fail, and fail disastrously, as described very nicely in this great thread: What is wrong with extrapolation? (thanks to @J.M.isnotastatistician for reminding me of that).



Edit based on comments: whether interpolating or extrapolating, it's always best to have some theory to ground expectations. If theory-free modelling must be done, the risk from interpolation is usually less than that from extrapolation. That said, as the gap between data points increases in magnitude, interpolation also becomes more and more fraught with risk.






share|cite|improve this answer












$endgroup$










  • 5




    $begingroup$
    I like your answer, and regard it as complementary to mine and in no sense competing. But a small point, important for some readers, is that red and green are hard for quite a few people to distinguish visually.
    $endgroup$
    – Nick Cox
    Jul 24 at 10:27






  • 1




    $begingroup$
    @NickCox Good point, thank you for raising that - I've now changed the colour scheme.
    $endgroup$
    – mkt - Reinstate Monica
    Jul 24 at 11:06










  • $begingroup$
    I don't like this answer very much. The fact that your example data is just two blobs means (or at least it could mean; it's hard to be confident with only six sample points) that interpolating between them could also be considered extrapolating outside of the data support. And it would be quite conceivable for the extra data points to diverge strongly in that region. There might actually be a better quadratic model. If you're going to interpolate in such a situation, you'd better had an a-priori reason for preferring a particular model. And that is also the case for your CO₂ example!
    $endgroup$
    – leftaroundabout
    Jul 24 at 14:28






  • 1




    $begingroup$
    @leftaroundabout My point was that the Keeling curve pattern is so strong that extrapolations ignoring economics & physics are still reasonably accurate on the scale of years to a few decades. I noted 'past few decades' precisely because that's the time scale on which we have had high-resolution measurements. This is an example where extrapolation would not have led you badly wrong and I think that's worth noting. I think it would take wilful misreading to claim that this answer is advocating theory-free extrapolation.
    $endgroup$
    – mkt - Reinstate Monica
    Jul 24 at 15:23






  • 1




    $begingroup$
    Relatedly, I gave Taleb's "turkey example" in this answer as a warning for people who use extrapolation.
    $endgroup$
    – J. M. is not a statistician
    Jul 25 at 7:18


















20
















$begingroup$

In essence interpolation is an operation within the data support, or between existing known data points; extrapolation is beyond the data support. Otherwise put, the criterion is: where are the missing values?



One reason for the distinction is that extrapolation is usually more difficult to do well, and even dangerous, statistically if not practically. That is not always true: for example, river floods may overwhelm the means of measuring discharge or even stage (vertical level), tearing a hole in the measured record. In those circumstances, interpolation of discharge or stage is difficult too and being within the data support does not help much.



In the long run, qualitative change usually supersedes quantitative change. Around 1900 there was much concern that growth in horse-drawn traffic would swamp cities with mostly unwanted excrement. The exponential in excrement was superseded by the internal combustion engine and its different exponentials.




A trend is a trend is a trend,

But the question is, will it bend?

Will it alter its course

Through some unforeseen force

And come to a premature end?



-- Alexander Cairncross



Cairncross, A. 1969. Economic forecasting. The Economic Journal, 79: 797-812. doi:10.2307/2229792 (quotation on p.797)







share|cite|improve this answer












$endgroup$










  • 1




    $begingroup$
    Good answer. The interpretation is right there in the name - interpolation = to smooth within, extrapolation = to smooth beyond.
    $endgroup$
    – Nuclear Wang
    Jul 23 at 15:33






  • 1




    $begingroup$
    IMO this is the correct answer. “Data support” is the crucial bit; even if the point you want to go is between two measured ones then it may still lie outside the data support. For example, if you have prosperity data for people in the Roman antiquity and from the modern day, but not in between, then interpolating into the middle ages would be very problematic. I'd call this extrapolation. OTOH, if you have data scattered sparsely but uniformly through the entire time span, then interpolating to a particular year is much more plausible.
    $endgroup$
    – leftaroundabout
    Jul 24 at 14:35







  • 1




    $begingroup$
    @leftaroundabout Just because interpolation may be done over a huge gap in data does not make it extrapolation. You're mistaking the advisability of the procedure for the procedure itself. Sometimes interpolation is a bad idea too.
    $endgroup$
    – mkt - Reinstate Monica
    Jul 24 at 15:41






  • 1




    $begingroup$
    @mkt: I'm going to side with leftaroundabout that his first example could be considered extrapolation, as interpolation vs extrapolation isn't really as well defined as we may want to think. A simple transformation of variables can turn interpolation into extrapolation. In his example, using something like distance functions instead of raw time means that while in raw time we are interpolating, in distances we are extrapolating...and using raw times would probably be a bad idea.
    $endgroup$
    – Cliff AB
    Jul 24 at 17:45






  • 1




    $begingroup$
    This is my answer. I don't feel the need to qualify it. A broad distinction between interpolation and extrapolation doesn't rule it out being a little difficult to decide which is being undertaken. If you have a big hole in the middle of the data space, labelling could go either way. As some wag pointed out, the fact that the end of the day and the beginning of the night blur into one another doesn't make the distinction between day and night pointless or useless.
    $endgroup$
    – Nick Cox
    Jul 25 at 13:21


















12
















$begingroup$

TL;DR version:




  • Interpolation takes place between existing data points.


  • Extrapolation takes place beyond them.

Mnemonic: interpolation => inside.



FWIW: The prefix inter- means between, and extra- means beyond. Think also of interstate highways which go between states, or extraterrestrials from beyond our planet.






share|cite|improve this answer










$endgroup$






















    1
















    $begingroup$

    Example:



    Study: Want to fit a simple linear regression on the height on the age for girls of age 6-15 years old. Sample size is 100, age is calculated by (date of measuring - date of birth)/365.25.



    After data collection, model is fit and get the estimate of intercept b0 and slope b1. it means we have E(height|age) = b0 + b1*age.



    When you want the mean height for age 13, you find that there is no 15 year old girl in your sample of 100 girls, one f them is 12.83 years old and one is 13.24.



    Now you plug in age = 13 into formula E(height|age) = b0 + b1*age. It is called interpolation because 13 year old is covered by the range of your data used to fit model.



    If you want to get mean height for age 30 use that formula, it is called extrapolation, because age 30 is out of the range of the age covered by your data.



    If the model has several covariates, need to be careful because it is hard to draw the border that data covered.



    In statistics, we do not advocate the extrapolation.






    share|cite|improve this answer










    $endgroup$














      protected by gung - Reinstate Monica Jul 24 at 1:22



      Thank you for your interest in this question.
      Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).



      Would you like to answer one of these unanswered questions instead?














      4 Answers
      4






      active

      oldest

      votes








      4 Answers
      4






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      49
















      $begingroup$

      To add a visual explanation to this: let's consider a few points that you plan to model.



      enter image description here



      They look like they could be described well with a straight line, so you fit a linear regression to them:



      enter image description here



      This regression line lets you both interpolate (generate expected values in between your data points) and extrapolate (generate expected values outside the range of your data points). I've highlighted the extrapolation in red and the biggest region of interpolation in blue. To be clear, even the tiny regions between the points are interpolated, but I'm only highlighting the big one here.



      enter image description here



      Why is extrapolation generally more of a concern? Because you're usually much less certain about the shape of the relationship outside the range of your data. Consider what might happen when you collect a few more data points (hollow circles):



      enter image description here



      It turns out that the relationship was not captured well with your hypothesized relationship after all. The predictions in the extrapolated region are way off. Even if you had guessed the precise function that describes this nonlinear relationship correctly, your data did not extend over enough of a range for you to capture the nonlinearity well, so you may still have been pretty far off. Note that this is a problem not just for linear regression, but for any relationship at all - this is why extrapolation is considered dangerous.



      Predictions in the interpolated region are also incorrect because of the lack of nonlinearity in the fit, but their prediction error is much lower. There's no guarantee that you won't have an unexpected relationship in between your points (i.e. the region of interpolation), but it's generally less likely.




      I will add that extrapolation is not always a terrible idea - if you extrapolate a tiny bit outside the range of your data, you're probably not going to be very wrong (though it is possible!). Ancients who had no good scientific model of the world would not have been far wrong if they forecast that the sun would rise again the next day and the day after that (though one day far into the future, even this will fail).



      And sometimes, extrapolation can even be informative - for example, simple short-term extrapolations of the exponential increase in atmospheric CO$_2$ have been reasonably accurate over the past few decades. If you were a student who didn't have scientific expertise but wanted a rough, short-term forecast, this would have given you fairly reasonable results. But the farther away from your data you extrapolate, the more likely your prediction is likely to fail, and fail disastrously, as described very nicely in this great thread: What is wrong with extrapolation? (thanks to @J.M.isnotastatistician for reminding me of that).



      Edit based on comments: whether interpolating or extrapolating, it's always best to have some theory to ground expectations. If theory-free modelling must be done, the risk from interpolation is usually less than that from extrapolation. That said, as the gap between data points increases in magnitude, interpolation also becomes more and more fraught with risk.






      share|cite|improve this answer












      $endgroup$










      • 5




        $begingroup$
        I like your answer, and regard it as complementary to mine and in no sense competing. But a small point, important for some readers, is that red and green are hard for quite a few people to distinguish visually.
        $endgroup$
        – Nick Cox
        Jul 24 at 10:27






      • 1




        $begingroup$
        @NickCox Good point, thank you for raising that - I've now changed the colour scheme.
        $endgroup$
        – mkt - Reinstate Monica
        Jul 24 at 11:06










      • $begingroup$
        I don't like this answer very much. The fact that your example data is just two blobs means (or at least it could mean; it's hard to be confident with only six sample points) that interpolating between them could also be considered extrapolating outside of the data support. And it would be quite conceivable for the extra data points to diverge strongly in that region. There might actually be a better quadratic model. If you're going to interpolate in such a situation, you'd better had an a-priori reason for preferring a particular model. And that is also the case for your CO₂ example!
        $endgroup$
        – leftaroundabout
        Jul 24 at 14:28






      • 1




        $begingroup$
        @leftaroundabout My point was that the Keeling curve pattern is so strong that extrapolations ignoring economics & physics are still reasonably accurate on the scale of years to a few decades. I noted 'past few decades' precisely because that's the time scale on which we have had high-resolution measurements. This is an example where extrapolation would not have led you badly wrong and I think that's worth noting. I think it would take wilful misreading to claim that this answer is advocating theory-free extrapolation.
        $endgroup$
        – mkt - Reinstate Monica
        Jul 24 at 15:23






      • 1




        $begingroup$
        Relatedly, I gave Taleb's "turkey example" in this answer as a warning for people who use extrapolation.
        $endgroup$
        – J. M. is not a statistician
        Jul 25 at 7:18















      49
















      $begingroup$

      To add a visual explanation to this: let's consider a few points that you plan to model.



      enter image description here



      They look like they could be described well with a straight line, so you fit a linear regression to them:



      enter image description here



      This regression line lets you both interpolate (generate expected values in between your data points) and extrapolate (generate expected values outside the range of your data points). I've highlighted the extrapolation in red and the biggest region of interpolation in blue. To be clear, even the tiny regions between the points are interpolated, but I'm only highlighting the big one here.



      enter image description here



      Why is extrapolation generally more of a concern? Because you're usually much less certain about the shape of the relationship outside the range of your data. Consider what might happen when you collect a few more data points (hollow circles):



      enter image description here



      It turns out that the relationship was not captured well with your hypothesized relationship after all. The predictions in the extrapolated region are way off. Even if you had guessed the precise function that describes this nonlinear relationship correctly, your data did not extend over enough of a range for you to capture the nonlinearity well, so you may still have been pretty far off. Note that this is a problem not just for linear regression, but for any relationship at all - this is why extrapolation is considered dangerous.



      Predictions in the interpolated region are also incorrect because of the lack of nonlinearity in the fit, but their prediction error is much lower. There's no guarantee that you won't have an unexpected relationship in between your points (i.e. the region of interpolation), but it's generally less likely.




      I will add that extrapolation is not always a terrible idea - if you extrapolate a tiny bit outside the range of your data, you're probably not going to be very wrong (though it is possible!). Ancients who had no good scientific model of the world would not have been far wrong if they forecast that the sun would rise again the next day and the day after that (though one day far into the future, even this will fail).



      And sometimes, extrapolation can even be informative - for example, simple short-term extrapolations of the exponential increase in atmospheric CO$_2$ have been reasonably accurate over the past few decades. If you were a student who didn't have scientific expertise but wanted a rough, short-term forecast, this would have given you fairly reasonable results. But the farther away from your data you extrapolate, the more likely your prediction is likely to fail, and fail disastrously, as described very nicely in this great thread: What is wrong with extrapolation? (thanks to @J.M.isnotastatistician for reminding me of that).



      Edit based on comments: whether interpolating or extrapolating, it's always best to have some theory to ground expectations. If theory-free modelling must be done, the risk from interpolation is usually less than that from extrapolation. That said, as the gap between data points increases in magnitude, interpolation also becomes more and more fraught with risk.






      share|cite|improve this answer












      $endgroup$










      • 5




        $begingroup$
        I like your answer, and regard it as complementary to mine and in no sense competing. But a small point, important for some readers, is that red and green are hard for quite a few people to distinguish visually.
        $endgroup$
        – Nick Cox
        Jul 24 at 10:27






      • 1




        $begingroup$
        @NickCox Good point, thank you for raising that - I've now changed the colour scheme.
        $endgroup$
        – mkt - Reinstate Monica
        Jul 24 at 11:06










      • $begingroup$
        I don't like this answer very much. The fact that your example data is just two blobs means (or at least it could mean; it's hard to be confident with only six sample points) that interpolating between them could also be considered extrapolating outside of the data support. And it would be quite conceivable for the extra data points to diverge strongly in that region. There might actually be a better quadratic model. If you're going to interpolate in such a situation, you'd better had an a-priori reason for preferring a particular model. And that is also the case for your CO₂ example!
        $endgroup$
        – leftaroundabout
        Jul 24 at 14:28






      • 1




        $begingroup$
        @leftaroundabout My point was that the Keeling curve pattern is so strong that extrapolations ignoring economics & physics are still reasonably accurate on the scale of years to a few decades. I noted 'past few decades' precisely because that's the time scale on which we have had high-resolution measurements. This is an example where extrapolation would not have led you badly wrong and I think that's worth noting. I think it would take wilful misreading to claim that this answer is advocating theory-free extrapolation.
        $endgroup$
        – mkt - Reinstate Monica
        Jul 24 at 15:23






      • 1




        $begingroup$
        Relatedly, I gave Taleb's "turkey example" in this answer as a warning for people who use extrapolation.
        $endgroup$
        – J. M. is not a statistician
        Jul 25 at 7:18













      49














      49










      49







      $begingroup$

      To add a visual explanation to this: let's consider a few points that you plan to model.



      enter image description here



      They look like they could be described well with a straight line, so you fit a linear regression to them:



      enter image description here



      This regression line lets you both interpolate (generate expected values in between your data points) and extrapolate (generate expected values outside the range of your data points). I've highlighted the extrapolation in red and the biggest region of interpolation in blue. To be clear, even the tiny regions between the points are interpolated, but I'm only highlighting the big one here.



      enter image description here



      Why is extrapolation generally more of a concern? Because you're usually much less certain about the shape of the relationship outside the range of your data. Consider what might happen when you collect a few more data points (hollow circles):



      enter image description here



      It turns out that the relationship was not captured well with your hypothesized relationship after all. The predictions in the extrapolated region are way off. Even if you had guessed the precise function that describes this nonlinear relationship correctly, your data did not extend over enough of a range for you to capture the nonlinearity well, so you may still have been pretty far off. Note that this is a problem not just for linear regression, but for any relationship at all - this is why extrapolation is considered dangerous.



      Predictions in the interpolated region are also incorrect because of the lack of nonlinearity in the fit, but their prediction error is much lower. There's no guarantee that you won't have an unexpected relationship in between your points (i.e. the region of interpolation), but it's generally less likely.




      I will add that extrapolation is not always a terrible idea - if you extrapolate a tiny bit outside the range of your data, you're probably not going to be very wrong (though it is possible!). Ancients who had no good scientific model of the world would not have been far wrong if they forecast that the sun would rise again the next day and the day after that (though one day far into the future, even this will fail).



      And sometimes, extrapolation can even be informative - for example, simple short-term extrapolations of the exponential increase in atmospheric CO$_2$ have been reasonably accurate over the past few decades. If you were a student who didn't have scientific expertise but wanted a rough, short-term forecast, this would have given you fairly reasonable results. But the farther away from your data you extrapolate, the more likely your prediction is likely to fail, and fail disastrously, as described very nicely in this great thread: What is wrong with extrapolation? (thanks to @J.M.isnotastatistician for reminding me of that).



      Edit based on comments: whether interpolating or extrapolating, it's always best to have some theory to ground expectations. If theory-free modelling must be done, the risk from interpolation is usually less than that from extrapolation. That said, as the gap between data points increases in magnitude, interpolation also becomes more and more fraught with risk.






      share|cite|improve this answer












      $endgroup$



      To add a visual explanation to this: let's consider a few points that you plan to model.



      enter image description here



      They look like they could be described well with a straight line, so you fit a linear regression to them:



      enter image description here



      This regression line lets you both interpolate (generate expected values in between your data points) and extrapolate (generate expected values outside the range of your data points). I've highlighted the extrapolation in red and the biggest region of interpolation in blue. To be clear, even the tiny regions between the points are interpolated, but I'm only highlighting the big one here.



      enter image description here



      Why is extrapolation generally more of a concern? Because you're usually much less certain about the shape of the relationship outside the range of your data. Consider what might happen when you collect a few more data points (hollow circles):



      enter image description here



      It turns out that the relationship was not captured well with your hypothesized relationship after all. The predictions in the extrapolated region are way off. Even if you had guessed the precise function that describes this nonlinear relationship correctly, your data did not extend over enough of a range for you to capture the nonlinearity well, so you may still have been pretty far off. Note that this is a problem not just for linear regression, but for any relationship at all - this is why extrapolation is considered dangerous.



      Predictions in the interpolated region are also incorrect because of the lack of nonlinearity in the fit, but their prediction error is much lower. There's no guarantee that you won't have an unexpected relationship in between your points (i.e. the region of interpolation), but it's generally less likely.




      I will add that extrapolation is not always a terrible idea - if you extrapolate a tiny bit outside the range of your data, you're probably not going to be very wrong (though it is possible!). Ancients who had no good scientific model of the world would not have been far wrong if they forecast that the sun would rise again the next day and the day after that (though one day far into the future, even this will fail).



      And sometimes, extrapolation can even be informative - for example, simple short-term extrapolations of the exponential increase in atmospheric CO$_2$ have been reasonably accurate over the past few decades. If you were a student who didn't have scientific expertise but wanted a rough, short-term forecast, this would have given you fairly reasonable results. But the farther away from your data you extrapolate, the more likely your prediction is likely to fail, and fail disastrously, as described very nicely in this great thread: What is wrong with extrapolation? (thanks to @J.M.isnotastatistician for reminding me of that).



      Edit based on comments: whether interpolating or extrapolating, it's always best to have some theory to ground expectations. If theory-free modelling must be done, the risk from interpolation is usually less than that from extrapolation. That said, as the gap between data points increases in magnitude, interpolation also becomes more and more fraught with risk.







      share|cite|improve this answer















      share|cite|improve this answer




      share|cite|improve this answer








      edited Aug 2 at 9:51

























      answered Jul 23 at 15:42









      mkt - Reinstate Monicamkt - Reinstate Monica

      9,4516 gold badges33 silver badges95 bronze badges




      9,4516 gold badges33 silver badges95 bronze badges










      • 5




        $begingroup$
        I like your answer, and regard it as complementary to mine and in no sense competing. But a small point, important for some readers, is that red and green are hard for quite a few people to distinguish visually.
        $endgroup$
        – Nick Cox
        Jul 24 at 10:27






      • 1




        $begingroup$
        @NickCox Good point, thank you for raising that - I've now changed the colour scheme.
        $endgroup$
        – mkt - Reinstate Monica
        Jul 24 at 11:06










      • $begingroup$
        I don't like this answer very much. The fact that your example data is just two blobs means (or at least it could mean; it's hard to be confident with only six sample points) that interpolating between them could also be considered extrapolating outside of the data support. And it would be quite conceivable for the extra data points to diverge strongly in that region. There might actually be a better quadratic model. If you're going to interpolate in such a situation, you'd better had an a-priori reason for preferring a particular model. And that is also the case for your CO₂ example!
        $endgroup$
        – leftaroundabout
        Jul 24 at 14:28






      • 1




        $begingroup$
        @leftaroundabout My point was that the Keeling curve pattern is so strong that extrapolations ignoring economics & physics are still reasonably accurate on the scale of years to a few decades. I noted 'past few decades' precisely because that's the time scale on which we have had high-resolution measurements. This is an example where extrapolation would not have led you badly wrong and I think that's worth noting. I think it would take wilful misreading to claim that this answer is advocating theory-free extrapolation.
        $endgroup$
        – mkt - Reinstate Monica
        Jul 24 at 15:23






      • 1




        $begingroup$
        Relatedly, I gave Taleb's "turkey example" in this answer as a warning for people who use extrapolation.
        $endgroup$
        – J. M. is not a statistician
        Jul 25 at 7:18












      • 5




        $begingroup$
        I like your answer, and regard it as complementary to mine and in no sense competing. But a small point, important for some readers, is that red and green are hard for quite a few people to distinguish visually.
        $endgroup$
        – Nick Cox
        Jul 24 at 10:27






      • 1




        $begingroup$
        @NickCox Good point, thank you for raising that - I've now changed the colour scheme.
        $endgroup$
        – mkt - Reinstate Monica
        Jul 24 at 11:06










      • $begingroup$
        I don't like this answer very much. The fact that your example data is just two blobs means (or at least it could mean; it's hard to be confident with only six sample points) that interpolating between them could also be considered extrapolating outside of the data support. And it would be quite conceivable for the extra data points to diverge strongly in that region. There might actually be a better quadratic model. If you're going to interpolate in such a situation, you'd better had an a-priori reason for preferring a particular model. And that is also the case for your CO₂ example!
        $endgroup$
        – leftaroundabout
        Jul 24 at 14:28






      • 1




        $begingroup$
        @leftaroundabout My point was that the Keeling curve pattern is so strong that extrapolations ignoring economics & physics are still reasonably accurate on the scale of years to a few decades. I noted 'past few decades' precisely because that's the time scale on which we have had high-resolution measurements. This is an example where extrapolation would not have led you badly wrong and I think that's worth noting. I think it would take wilful misreading to claim that this answer is advocating theory-free extrapolation.
        $endgroup$
        – mkt - Reinstate Monica
        Jul 24 at 15:23






      • 1




        $begingroup$
        Relatedly, I gave Taleb's "turkey example" in this answer as a warning for people who use extrapolation.
        $endgroup$
        – J. M. is not a statistician
        Jul 25 at 7:18







      5




      5




      $begingroup$
      I like your answer, and regard it as complementary to mine and in no sense competing. But a small point, important for some readers, is that red and green are hard for quite a few people to distinguish visually.
      $endgroup$
      – Nick Cox
      Jul 24 at 10:27




      $begingroup$
      I like your answer, and regard it as complementary to mine and in no sense competing. But a small point, important for some readers, is that red and green are hard for quite a few people to distinguish visually.
      $endgroup$
      – Nick Cox
      Jul 24 at 10:27




      1




      1




      $begingroup$
      @NickCox Good point, thank you for raising that - I've now changed the colour scheme.
      $endgroup$
      – mkt - Reinstate Monica
      Jul 24 at 11:06




      $begingroup$
      @NickCox Good point, thank you for raising that - I've now changed the colour scheme.
      $endgroup$
      – mkt - Reinstate Monica
      Jul 24 at 11:06












      $begingroup$
      I don't like this answer very much. The fact that your example data is just two blobs means (or at least it could mean; it's hard to be confident with only six sample points) that interpolating between them could also be considered extrapolating outside of the data support. And it would be quite conceivable for the extra data points to diverge strongly in that region. There might actually be a better quadratic model. If you're going to interpolate in such a situation, you'd better had an a-priori reason for preferring a particular model. And that is also the case for your CO₂ example!
      $endgroup$
      – leftaroundabout
      Jul 24 at 14:28




      $begingroup$
      I don't like this answer very much. The fact that your example data is just two blobs means (or at least it could mean; it's hard to be confident with only six sample points) that interpolating between them could also be considered extrapolating outside of the data support. And it would be quite conceivable for the extra data points to diverge strongly in that region. There might actually be a better quadratic model. If you're going to interpolate in such a situation, you'd better had an a-priori reason for preferring a particular model. And that is also the case for your CO₂ example!
      $endgroup$
      – leftaroundabout
      Jul 24 at 14:28




      1




      1




      $begingroup$
      @leftaroundabout My point was that the Keeling curve pattern is so strong that extrapolations ignoring economics & physics are still reasonably accurate on the scale of years to a few decades. I noted 'past few decades' precisely because that's the time scale on which we have had high-resolution measurements. This is an example where extrapolation would not have led you badly wrong and I think that's worth noting. I think it would take wilful misreading to claim that this answer is advocating theory-free extrapolation.
      $endgroup$
      – mkt - Reinstate Monica
      Jul 24 at 15:23




      $begingroup$
      @leftaroundabout My point was that the Keeling curve pattern is so strong that extrapolations ignoring economics & physics are still reasonably accurate on the scale of years to a few decades. I noted 'past few decades' precisely because that's the time scale on which we have had high-resolution measurements. This is an example where extrapolation would not have led you badly wrong and I think that's worth noting. I think it would take wilful misreading to claim that this answer is advocating theory-free extrapolation.
      $endgroup$
      – mkt - Reinstate Monica
      Jul 24 at 15:23




      1




      1




      $begingroup$
      Relatedly, I gave Taleb's "turkey example" in this answer as a warning for people who use extrapolation.
      $endgroup$
      – J. M. is not a statistician
      Jul 25 at 7:18




      $begingroup$
      Relatedly, I gave Taleb's "turkey example" in this answer as a warning for people who use extrapolation.
      $endgroup$
      – J. M. is not a statistician
      Jul 25 at 7:18













      20
















      $begingroup$

      In essence interpolation is an operation within the data support, or between existing known data points; extrapolation is beyond the data support. Otherwise put, the criterion is: where are the missing values?



      One reason for the distinction is that extrapolation is usually more difficult to do well, and even dangerous, statistically if not practically. That is not always true: for example, river floods may overwhelm the means of measuring discharge or even stage (vertical level), tearing a hole in the measured record. In those circumstances, interpolation of discharge or stage is difficult too and being within the data support does not help much.



      In the long run, qualitative change usually supersedes quantitative change. Around 1900 there was much concern that growth in horse-drawn traffic would swamp cities with mostly unwanted excrement. The exponential in excrement was superseded by the internal combustion engine and its different exponentials.




      A trend is a trend is a trend,

      But the question is, will it bend?

      Will it alter its course

      Through some unforeseen force

      And come to a premature end?



      -- Alexander Cairncross



      Cairncross, A. 1969. Economic forecasting. The Economic Journal, 79: 797-812. doi:10.2307/2229792 (quotation on p.797)







      share|cite|improve this answer












      $endgroup$










      • 1




        $begingroup$
        Good answer. The interpretation is right there in the name - interpolation = to smooth within, extrapolation = to smooth beyond.
        $endgroup$
        – Nuclear Wang
        Jul 23 at 15:33






      • 1




        $begingroup$
        IMO this is the correct answer. “Data support” is the crucial bit; even if the point you want to go is between two measured ones then it may still lie outside the data support. For example, if you have prosperity data for people in the Roman antiquity and from the modern day, but not in between, then interpolating into the middle ages would be very problematic. I'd call this extrapolation. OTOH, if you have data scattered sparsely but uniformly through the entire time span, then interpolating to a particular year is much more plausible.
        $endgroup$
        – leftaroundabout
        Jul 24 at 14:35







      • 1




        $begingroup$
        @leftaroundabout Just because interpolation may be done over a huge gap in data does not make it extrapolation. You're mistaking the advisability of the procedure for the procedure itself. Sometimes interpolation is a bad idea too.
        $endgroup$
        – mkt - Reinstate Monica
        Jul 24 at 15:41






      • 1




        $begingroup$
        @mkt: I'm going to side with leftaroundabout that his first example could be considered extrapolation, as interpolation vs extrapolation isn't really as well defined as we may want to think. A simple transformation of variables can turn interpolation into extrapolation. In his example, using something like distance functions instead of raw time means that while in raw time we are interpolating, in distances we are extrapolating...and using raw times would probably be a bad idea.
        $endgroup$
        – Cliff AB
        Jul 24 at 17:45






      • 1




        $begingroup$
        This is my answer. I don't feel the need to qualify it. A broad distinction between interpolation and extrapolation doesn't rule it out being a little difficult to decide which is being undertaken. If you have a big hole in the middle of the data space, labelling could go either way. As some wag pointed out, the fact that the end of the day and the beginning of the night blur into one another doesn't make the distinction between day and night pointless or useless.
        $endgroup$
        – Nick Cox
        Jul 25 at 13:21















      20
















      $begingroup$

      In essence interpolation is an operation within the data support, or between existing known data points; extrapolation is beyond the data support. Otherwise put, the criterion is: where are the missing values?



      One reason for the distinction is that extrapolation is usually more difficult to do well, and even dangerous, statistically if not practically. That is not always true: for example, river floods may overwhelm the means of measuring discharge or even stage (vertical level), tearing a hole in the measured record. In those circumstances, interpolation of discharge or stage is difficult too and being within the data support does not help much.



      In the long run, qualitative change usually supersedes quantitative change. Around 1900 there was much concern that growth in horse-drawn traffic would swamp cities with mostly unwanted excrement. The exponential in excrement was superseded by the internal combustion engine and its different exponentials.




      A trend is a trend is a trend,

      But the question is, will it bend?

      Will it alter its course

      Through some unforeseen force

      And come to a premature end?



      -- Alexander Cairncross



      Cairncross, A. 1969. Economic forecasting. The Economic Journal, 79: 797-812. doi:10.2307/2229792 (quotation on p.797)







      share|cite|improve this answer












      $endgroup$










      • 1




        $begingroup$
        Good answer. The interpretation is right there in the name - interpolation = to smooth within, extrapolation = to smooth beyond.
        $endgroup$
        – Nuclear Wang
        Jul 23 at 15:33






      • 1




        $begingroup$
        IMO this is the correct answer. “Data support” is the crucial bit; even if the point you want to go is between two measured ones then it may still lie outside the data support. For example, if you have prosperity data for people in the Roman antiquity and from the modern day, but not in between, then interpolating into the middle ages would be very problematic. I'd call this extrapolation. OTOH, if you have data scattered sparsely but uniformly through the entire time span, then interpolating to a particular year is much more plausible.
        $endgroup$
        – leftaroundabout
        Jul 24 at 14:35







      • 1




        $begingroup$
        @leftaroundabout Just because interpolation may be done over a huge gap in data does not make it extrapolation. You're mistaking the advisability of the procedure for the procedure itself. Sometimes interpolation is a bad idea too.
        $endgroup$
        – mkt - Reinstate Monica
        Jul 24 at 15:41






      • 1




        $begingroup$
        @mkt: I'm going to side with leftaroundabout that his first example could be considered extrapolation, as interpolation vs extrapolation isn't really as well defined as we may want to think. A simple transformation of variables can turn interpolation into extrapolation. In his example, using something like distance functions instead of raw time means that while in raw time we are interpolating, in distances we are extrapolating...and using raw times would probably be a bad idea.
        $endgroup$
        – Cliff AB
        Jul 24 at 17:45






      • 1




        $begingroup$
        This is my answer. I don't feel the need to qualify it. A broad distinction between interpolation and extrapolation doesn't rule it out being a little difficult to decide which is being undertaken. If you have a big hole in the middle of the data space, labelling could go either way. As some wag pointed out, the fact that the end of the day and the beginning of the night blur into one another doesn't make the distinction between day and night pointless or useless.
        $endgroup$
        – Nick Cox
        Jul 25 at 13:21













      20














      20










      20







      $begingroup$

      In essence interpolation is an operation within the data support, or between existing known data points; extrapolation is beyond the data support. Otherwise put, the criterion is: where are the missing values?



      One reason for the distinction is that extrapolation is usually more difficult to do well, and even dangerous, statistically if not practically. That is not always true: for example, river floods may overwhelm the means of measuring discharge or even stage (vertical level), tearing a hole in the measured record. In those circumstances, interpolation of discharge or stage is difficult too and being within the data support does not help much.



      In the long run, qualitative change usually supersedes quantitative change. Around 1900 there was much concern that growth in horse-drawn traffic would swamp cities with mostly unwanted excrement. The exponential in excrement was superseded by the internal combustion engine and its different exponentials.




      A trend is a trend is a trend,

      But the question is, will it bend?

      Will it alter its course

      Through some unforeseen force

      And come to a premature end?



      -- Alexander Cairncross



      Cairncross, A. 1969. Economic forecasting. The Economic Journal, 79: 797-812. doi:10.2307/2229792 (quotation on p.797)







      share|cite|improve this answer












      $endgroup$



      In essence interpolation is an operation within the data support, or between existing known data points; extrapolation is beyond the data support. Otherwise put, the criterion is: where are the missing values?



      One reason for the distinction is that extrapolation is usually more difficult to do well, and even dangerous, statistically if not practically. That is not always true: for example, river floods may overwhelm the means of measuring discharge or even stage (vertical level), tearing a hole in the measured record. In those circumstances, interpolation of discharge or stage is difficult too and being within the data support does not help much.



      In the long run, qualitative change usually supersedes quantitative change. Around 1900 there was much concern that growth in horse-drawn traffic would swamp cities with mostly unwanted excrement. The exponential in excrement was superseded by the internal combustion engine and its different exponentials.




      A trend is a trend is a trend,

      But the question is, will it bend?

      Will it alter its course

      Through some unforeseen force

      And come to a premature end?



      -- Alexander Cairncross



      Cairncross, A. 1969. Economic forecasting. The Economic Journal, 79: 797-812. doi:10.2307/2229792 (quotation on p.797)








      share|cite|improve this answer















      share|cite|improve this answer




      share|cite|improve this answer








      edited Jul 26 at 11:24









      amoeba says Reinstate Monica

      68.2k19 gold badges231 silver badges280 bronze badges




      68.2k19 gold badges231 silver badges280 bronze badges










      answered Jul 23 at 15:28









      Nick CoxNick Cox

      41.3k6 gold badges93 silver badges137 bronze badges




      41.3k6 gold badges93 silver badges137 bronze badges










      • 1




        $begingroup$
        Good answer. The interpretation is right there in the name - interpolation = to smooth within, extrapolation = to smooth beyond.
        $endgroup$
        – Nuclear Wang
        Jul 23 at 15:33






      • 1




        $begingroup$
        IMO this is the correct answer. “Data support” is the crucial bit; even if the point you want to go is between two measured ones then it may still lie outside the data support. For example, if you have prosperity data for people in the Roman antiquity and from the modern day, but not in between, then interpolating into the middle ages would be very problematic. I'd call this extrapolation. OTOH, if you have data scattered sparsely but uniformly through the entire time span, then interpolating to a particular year is much more plausible.
        $endgroup$
        – leftaroundabout
        Jul 24 at 14:35







      • 1




        $begingroup$
        @leftaroundabout Just because interpolation may be done over a huge gap in data does not make it extrapolation. You're mistaking the advisability of the procedure for the procedure itself. Sometimes interpolation is a bad idea too.
        $endgroup$
        – mkt - Reinstate Monica
        Jul 24 at 15:41






      • 1




        $begingroup$
        @mkt: I'm going to side with leftaroundabout that his first example could be considered extrapolation, as interpolation vs extrapolation isn't really as well defined as we may want to think. A simple transformation of variables can turn interpolation into extrapolation. In his example, using something like distance functions instead of raw time means that while in raw time we are interpolating, in distances we are extrapolating...and using raw times would probably be a bad idea.
        $endgroup$
        – Cliff AB
        Jul 24 at 17:45






      • 1




        $begingroup$
        This is my answer. I don't feel the need to qualify it. A broad distinction between interpolation and extrapolation doesn't rule it out being a little difficult to decide which is being undertaken. If you have a big hole in the middle of the data space, labelling could go either way. As some wag pointed out, the fact that the end of the day and the beginning of the night blur into one another doesn't make the distinction between day and night pointless or useless.
        $endgroup$
        – Nick Cox
        Jul 25 at 13:21












      • 1




        $begingroup$
        Good answer. The interpretation is right there in the name - interpolation = to smooth within, extrapolation = to smooth beyond.
        $endgroup$
        – Nuclear Wang
        Jul 23 at 15:33






      • 1




        $begingroup$
        IMO this is the correct answer. “Data support” is the crucial bit; even if the point you want to go is between two measured ones then it may still lie outside the data support. For example, if you have prosperity data for people in the Roman antiquity and from the modern day, but not in between, then interpolating into the middle ages would be very problematic. I'd call this extrapolation. OTOH, if you have data scattered sparsely but uniformly through the entire time span, then interpolating to a particular year is much more plausible.
        $endgroup$
        – leftaroundabout
        Jul 24 at 14:35







      • 1




        $begingroup$
        @leftaroundabout Just because interpolation may be done over a huge gap in data does not make it extrapolation. You're mistaking the advisability of the procedure for the procedure itself. Sometimes interpolation is a bad idea too.
        $endgroup$
        – mkt - Reinstate Monica
        Jul 24 at 15:41






      • 1




        $begingroup$
        @mkt: I'm going to side with leftaroundabout that his first example could be considered extrapolation, as interpolation vs extrapolation isn't really as well defined as we may want to think. A simple transformation of variables can turn interpolation into extrapolation. In his example, using something like distance functions instead of raw time means that while in raw time we are interpolating, in distances we are extrapolating...and using raw times would probably be a bad idea.
        $endgroup$
        – Cliff AB
        Jul 24 at 17:45






      • 1




        $begingroup$
        This is my answer. I don't feel the need to qualify it. A broad distinction between interpolation and extrapolation doesn't rule it out being a little difficult to decide which is being undertaken. If you have a big hole in the middle of the data space, labelling could go either way. As some wag pointed out, the fact that the end of the day and the beginning of the night blur into one another doesn't make the distinction between day and night pointless or useless.
        $endgroup$
        – Nick Cox
        Jul 25 at 13:21







      1




      1




      $begingroup$
      Good answer. The interpretation is right there in the name - interpolation = to smooth within, extrapolation = to smooth beyond.
      $endgroup$
      – Nuclear Wang
      Jul 23 at 15:33




      $begingroup$
      Good answer. The interpretation is right there in the name - interpolation = to smooth within, extrapolation = to smooth beyond.
      $endgroup$
      – Nuclear Wang
      Jul 23 at 15:33




      1




      1




      $begingroup$
      IMO this is the correct answer. “Data support” is the crucial bit; even if the point you want to go is between two measured ones then it may still lie outside the data support. For example, if you have prosperity data for people in the Roman antiquity and from the modern day, but not in between, then interpolating into the middle ages would be very problematic. I'd call this extrapolation. OTOH, if you have data scattered sparsely but uniformly through the entire time span, then interpolating to a particular year is much more plausible.
      $endgroup$
      – leftaroundabout
      Jul 24 at 14:35





      $begingroup$
      IMO this is the correct answer. “Data support” is the crucial bit; even if the point you want to go is between two measured ones then it may still lie outside the data support. For example, if you have prosperity data for people in the Roman antiquity and from the modern day, but not in between, then interpolating into the middle ages would be very problematic. I'd call this extrapolation. OTOH, if you have data scattered sparsely but uniformly through the entire time span, then interpolating to a particular year is much more plausible.
      $endgroup$
      – leftaroundabout
      Jul 24 at 14:35





      1




      1




      $begingroup$
      @leftaroundabout Just because interpolation may be done over a huge gap in data does not make it extrapolation. You're mistaking the advisability of the procedure for the procedure itself. Sometimes interpolation is a bad idea too.
      $endgroup$
      – mkt - Reinstate Monica
      Jul 24 at 15:41




      $begingroup$
      @leftaroundabout Just because interpolation may be done over a huge gap in data does not make it extrapolation. You're mistaking the advisability of the procedure for the procedure itself. Sometimes interpolation is a bad idea too.
      $endgroup$
      – mkt - Reinstate Monica
      Jul 24 at 15:41




      1




      1




      $begingroup$
      @mkt: I'm going to side with leftaroundabout that his first example could be considered extrapolation, as interpolation vs extrapolation isn't really as well defined as we may want to think. A simple transformation of variables can turn interpolation into extrapolation. In his example, using something like distance functions instead of raw time means that while in raw time we are interpolating, in distances we are extrapolating...and using raw times would probably be a bad idea.
      $endgroup$
      – Cliff AB
      Jul 24 at 17:45




      $begingroup$
      @mkt: I'm going to side with leftaroundabout that his first example could be considered extrapolation, as interpolation vs extrapolation isn't really as well defined as we may want to think. A simple transformation of variables can turn interpolation into extrapolation. In his example, using something like distance functions instead of raw time means that while in raw time we are interpolating, in distances we are extrapolating...and using raw times would probably be a bad idea.
      $endgroup$
      – Cliff AB
      Jul 24 at 17:45




      1




      1




      $begingroup$
      This is my answer. I don't feel the need to qualify it. A broad distinction between interpolation and extrapolation doesn't rule it out being a little difficult to decide which is being undertaken. If you have a big hole in the middle of the data space, labelling could go either way. As some wag pointed out, the fact that the end of the day and the beginning of the night blur into one another doesn't make the distinction between day and night pointless or useless.
      $endgroup$
      – Nick Cox
      Jul 25 at 13:21




      $begingroup$
      This is my answer. I don't feel the need to qualify it. A broad distinction between interpolation and extrapolation doesn't rule it out being a little difficult to decide which is being undertaken. If you have a big hole in the middle of the data space, labelling could go either way. As some wag pointed out, the fact that the end of the day and the beginning of the night blur into one another doesn't make the distinction between day and night pointless or useless.
      $endgroup$
      – Nick Cox
      Jul 25 at 13:21











      12
















      $begingroup$

      TL;DR version:




      • Interpolation takes place between existing data points.


      • Extrapolation takes place beyond them.

      Mnemonic: interpolation => inside.



      FWIW: The prefix inter- means between, and extra- means beyond. Think also of interstate highways which go between states, or extraterrestrials from beyond our planet.






      share|cite|improve this answer










      $endgroup$



















        12
















        $begingroup$

        TL;DR version:




        • Interpolation takes place between existing data points.


        • Extrapolation takes place beyond them.

        Mnemonic: interpolation => inside.



        FWIW: The prefix inter- means between, and extra- means beyond. Think also of interstate highways which go between states, or extraterrestrials from beyond our planet.






        share|cite|improve this answer










        $endgroup$

















          12














          12










          12







          $begingroup$

          TL;DR version:




          • Interpolation takes place between existing data points.


          • Extrapolation takes place beyond them.

          Mnemonic: interpolation => inside.



          FWIW: The prefix inter- means between, and extra- means beyond. Think also of interstate highways which go between states, or extraterrestrials from beyond our planet.






          share|cite|improve this answer










          $endgroup$



          TL;DR version:




          • Interpolation takes place between existing data points.


          • Extrapolation takes place beyond them.

          Mnemonic: interpolation => inside.



          FWIW: The prefix inter- means between, and extra- means beyond. Think also of interstate highways which go between states, or extraterrestrials from beyond our planet.







          share|cite|improve this answer













          share|cite|improve this answer




          share|cite|improve this answer










          answered Jul 24 at 1:02









          A CA C

          2211 silver badge4 bronze badges




          2211 silver badge4 bronze badges
























              1
















              $begingroup$

              Example:



              Study: Want to fit a simple linear regression on the height on the age for girls of age 6-15 years old. Sample size is 100, age is calculated by (date of measuring - date of birth)/365.25.



              After data collection, model is fit and get the estimate of intercept b0 and slope b1. it means we have E(height|age) = b0 + b1*age.



              When you want the mean height for age 13, you find that there is no 15 year old girl in your sample of 100 girls, one f them is 12.83 years old and one is 13.24.



              Now you plug in age = 13 into formula E(height|age) = b0 + b1*age. It is called interpolation because 13 year old is covered by the range of your data used to fit model.



              If you want to get mean height for age 30 use that formula, it is called extrapolation, because age 30 is out of the range of the age covered by your data.



              If the model has several covariates, need to be careful because it is hard to draw the border that data covered.



              In statistics, we do not advocate the extrapolation.






              share|cite|improve this answer










              $endgroup$



















                1
















                $begingroup$

                Example:



                Study: Want to fit a simple linear regression on the height on the age for girls of age 6-15 years old. Sample size is 100, age is calculated by (date of measuring - date of birth)/365.25.



                After data collection, model is fit and get the estimate of intercept b0 and slope b1. it means we have E(height|age) = b0 + b1*age.



                When you want the mean height for age 13, you find that there is no 15 year old girl in your sample of 100 girls, one f them is 12.83 years old and one is 13.24.



                Now you plug in age = 13 into formula E(height|age) = b0 + b1*age. It is called interpolation because 13 year old is covered by the range of your data used to fit model.



                If you want to get mean height for age 30 use that formula, it is called extrapolation, because age 30 is out of the range of the age covered by your data.



                If the model has several covariates, need to be careful because it is hard to draw the border that data covered.



                In statistics, we do not advocate the extrapolation.






                share|cite|improve this answer










                $endgroup$

















                  1














                  1










                  1







                  $begingroup$

                  Example:



                  Study: Want to fit a simple linear regression on the height on the age for girls of age 6-15 years old. Sample size is 100, age is calculated by (date of measuring - date of birth)/365.25.



                  After data collection, model is fit and get the estimate of intercept b0 and slope b1. it means we have E(height|age) = b0 + b1*age.



                  When you want the mean height for age 13, you find that there is no 15 year old girl in your sample of 100 girls, one f them is 12.83 years old and one is 13.24.



                  Now you plug in age = 13 into formula E(height|age) = b0 + b1*age. It is called interpolation because 13 year old is covered by the range of your data used to fit model.



                  If you want to get mean height for age 30 use that formula, it is called extrapolation, because age 30 is out of the range of the age covered by your data.



                  If the model has several covariates, need to be careful because it is hard to draw the border that data covered.



                  In statistics, we do not advocate the extrapolation.






                  share|cite|improve this answer










                  $endgroup$



                  Example:



                  Study: Want to fit a simple linear regression on the height on the age for girls of age 6-15 years old. Sample size is 100, age is calculated by (date of measuring - date of birth)/365.25.



                  After data collection, model is fit and get the estimate of intercept b0 and slope b1. it means we have E(height|age) = b0 + b1*age.



                  When you want the mean height for age 13, you find that there is no 15 year old girl in your sample of 100 girls, one f them is 12.83 years old and one is 13.24.



                  Now you plug in age = 13 into formula E(height|age) = b0 + b1*age. It is called interpolation because 13 year old is covered by the range of your data used to fit model.



                  If you want to get mean height for age 30 use that formula, it is called extrapolation, because age 30 is out of the range of the age covered by your data.



                  If the model has several covariates, need to be careful because it is hard to draw the border that data covered.



                  In statistics, we do not advocate the extrapolation.







                  share|cite|improve this answer













                  share|cite|improve this answer




                  share|cite|improve this answer










                  answered Jul 23 at 15:34









                  user158565user158565

                  6,4622 gold badges6 silver badges18 bronze badges




                  6,4622 gold badges6 silver badges18 bronze badges


















                      protected by gung - Reinstate Monica Jul 24 at 1:22



                      Thank you for your interest in this question.
                      Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).



                      Would you like to answer one of these unanswered questions instead?



                      Popular posts from this blog

                      Distance measures on a map of a game The 2019 Stack Overflow Developer Survey Results Are Inmin distance in a graphShortest distance path on contour plotHow to plot a tilted map?Finding points outside of a diskDelaunay link distanceAnnulus from GeoDisks: drawing a ring on a mapNegative Correlation DistanceFind distance along a path (GPS coordinates)Finding position at given distance in a GeoPathMathematics behind distance estimation using camera

                      How to get a smooth, uniform ParametricPlot of a 2D Region?How to plot a complicated Region?How to exclude a region from ParametricPlotHow discretize a region placing vertices on a specific non-uniform gridHow to transform a Plot or a ParametricPlot into a RegionHow can I get a smooth plot of a bounded region?Smooth ParametricPlot3D with RegionFunction?Smooth border of a region ParametricPlotSmooth region boundarySmooth region plot from list of pointsGet minimum y of a certain x in a region

                      Genealogie vun de Merowenger Vum Merowech bis zum Chilperich I. | Navigatiounsmenü