Why would they pick a gamma distribution here?Which has the heavier tail, lognormal or gamma?Why Are they doing exponential distributions?Relationship between inverse gamma and gamma distributionThree Parameter Gamma DistributionGamma Distribution with PercentagesParameterization of Gamma DistributionExplain why if $X$ has a $Gamma(alpha, 1)$ distribution, then the random variable $beta X$ has a $Gamma(alpha, beta)$ distributionWhy does not the weighted sum of gamma distribution come from weighted gamma variables?

Would a level 18+ Champion Fighter recover HP outside of combat?

Why is JavaScript not compiled to bytecode before sending over the network?

Why do some AFBs have planes parked at 45 degrees to others?

Is there mention of Maitreya Buddha in Pali Canon?

grammar of "sapientiae tuae non est numerus"

Is it safe to drink the water from the fountains found all over the older parts of Rome?

When was Newton "not good enough" for spaceflight; first use and first absolute requirement for relativistic corrections?

Black screen for 1-2 seconds while alt-tabbing a fullscreen game or using a Windows key

How to find maximum amperage need for fuse

Should we say "todo Colombia" or "toda Colombia"?

How can conflict be conducted between nations when warfare is never an option?

How important is quick release for a tripod?

Toy Vector Library - Magnitude & Unit Vector Functions

I shift the source code, you shift the input!

Is rotating a pawn so that it faces a different direction and then moves in that direction technically permitted according to the 2018 FIDE Laws?

I have to make an API where I can return orders (product name) placed by a customer using customer Id?

Debugging a custom object in LWC

Conversion of mass into energy with 100% efficiency

Why did George Lucas set Star Wars in the past instead of the future?

Are unitarily equivalent permutation matrices permutation similar?

Is it possible to duplicate an item in Stardew Valley?

What are the downsides of being a debt-free country?

What LEGO set do these bags come from

New manager unapproved PTO my old manager approved, because of a conference at the same time that's now a "condition of my employment here"



Why would they pick a gamma distribution here?


Which has the heavier tail, lognormal or gamma?Why Are they doing exponential distributions?Relationship between inverse gamma and gamma distributionThree Parameter Gamma DistributionGamma Distribution with PercentagesParameterization of Gamma DistributionExplain why if $X$ has a $Gamma(alpha, 1)$ distribution, then the random variable $beta X$ has a $Gamma(alpha, beta)$ distributionWhy does not the weighted sum of gamma distribution come from weighted gamma variables?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;









14















$begingroup$


In one of the exercises for my course, we're using a Kaggle medical dataset.



The exercise says:




we want to model the distribution of individual charges and we also really want to be able to capture our uncertainty about that distribution so we can better capture the range of values we might see. Loading the data and performing an initial view:




plot




We may suspect from the above that there is some sort of exponential-like distribution at play here. ...The insurance claim charges may possibly be multimodal. The gamma distribution may be applicable and we could test this for the distribution of charges that weren't insurance claims first.




I looked up "Gamma distribution" and found "a continuous, positive-only, unimodal distribution that encodes the time required for «alpha» events to occur in a Poisson process with mean arrival time of «beta»"



There's no time involved here, just unrelated charges, either insured or not.



Why would they choose a gamma distribution?










share|cite|improve this question









$endgroup$





















    14















    $begingroup$


    In one of the exercises for my course, we're using a Kaggle medical dataset.



    The exercise says:




    we want to model the distribution of individual charges and we also really want to be able to capture our uncertainty about that distribution so we can better capture the range of values we might see. Loading the data and performing an initial view:




    plot




    We may suspect from the above that there is some sort of exponential-like distribution at play here. ...The insurance claim charges may possibly be multimodal. The gamma distribution may be applicable and we could test this for the distribution of charges that weren't insurance claims first.




    I looked up "Gamma distribution" and found "a continuous, positive-only, unimodal distribution that encodes the time required for «alpha» events to occur in a Poisson process with mean arrival time of «beta»"



    There's no time involved here, just unrelated charges, either insured or not.



    Why would they choose a gamma distribution?










    share|cite|improve this question









    $endgroup$

















      14













      14









      14


      6



      $begingroup$


      In one of the exercises for my course, we're using a Kaggle medical dataset.



      The exercise says:




      we want to model the distribution of individual charges and we also really want to be able to capture our uncertainty about that distribution so we can better capture the range of values we might see. Loading the data and performing an initial view:




      plot




      We may suspect from the above that there is some sort of exponential-like distribution at play here. ...The insurance claim charges may possibly be multimodal. The gamma distribution may be applicable and we could test this for the distribution of charges that weren't insurance claims first.




      I looked up "Gamma distribution" and found "a continuous, positive-only, unimodal distribution that encodes the time required for «alpha» events to occur in a Poisson process with mean arrival time of «beta»"



      There's no time involved here, just unrelated charges, either insured or not.



      Why would they choose a gamma distribution?










      share|cite|improve this question









      $endgroup$




      In one of the exercises for my course, we're using a Kaggle medical dataset.



      The exercise says:




      we want to model the distribution of individual charges and we also really want to be able to capture our uncertainty about that distribution so we can better capture the range of values we might see. Loading the data and performing an initial view:




      plot




      We may suspect from the above that there is some sort of exponential-like distribution at play here. ...The insurance claim charges may possibly be multimodal. The gamma distribution may be applicable and we could test this for the distribution of charges that weren't insurance claims first.




      I looked up "Gamma distribution" and found "a continuous, positive-only, unimodal distribution that encodes the time required for «alpha» events to occur in a Poisson process with mean arrival time of «beta»"



      There's no time involved here, just unrelated charges, either insured or not.



      Why would they choose a gamma distribution?







      gamma-distribution






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Sep 29 at 21:53









      Vicki BVicki B

      2778 bronze badges




      2778 bronze badges























          1 Answer
          1






          active

          oldest

          votes


















          27

















          $begingroup$

          When you're considering simple parametric models for the conditional distribution of data (i.e. the distribution of each group, or the expected distribution for each combination of predictor variables), and you are dealing with a positive continuous distribution, the two common choices are Gamma and log-Normal. Besides satisfying the specification of the domain of the distribution (real numbers greater than zero), these distributions are computationally convenient and often make mechanistic sense.



          • The log-Normal distribution is easily derived by exponentiating a Normal distribution (conversely, log-transforming log-Normal deviates gives Normal deviates). From a mechanistic point of view, the log-Normal arises via the Central Limit Theorem when each observation reflects the product of a large number of iid random variables. Once you've log-transformed the data, you have access to a huge variety of computational and analytical tools (e.g., anything assuming Normality or using least-squares methods).

          • As your question points out, one way that a Gamma distribution arises is as the distribution of waiting times until $n$ independent events with a constant waiting time $lambda$ occur. I can't easily find a reference for a mechanistic model of Gamma distributions of insurance claims, but it also makes sense to use a Gamma distribution from a phenomenological (i.e., data description/computational convenience) point of view. The Gamma distribution is part of the exponential family (which includes the Normal but not the log-Normal), which means that all of the machinery of generalized linear models is available; it also has a particularly convenient form for analysis.

          There are other reasons one might pick one or the other - for example, the "heaviness" of the tail of the distribution, which might be important in predicting the frequency of extreme events. There are plenty of other positive, continuous distributions (e.g see this list), but they tend to be used in more specialized applications.



          Very few of these distributions will capture the multi-modality you see in the marginal distributions above, but multi-modality may be explained by the data being grouped into categories described by observed categorical predictors. If there are no observable predictors that explain the multimodality, one might choose to fit a finite mixture model based on a mixture of a (small, discrete) number of positive continuous distributions.






          share|cite|improve this answer












          $endgroup$









          • 1




            $begingroup$
            also worth noting that gamma and lognormal models give almost always very similar results
            $endgroup$
            – carlo
            Sep 30 at 15:18






          • 2




            $begingroup$
            I work in health services research. I can confirm that in general, a gamma or lognormal distribution would be an appropriate choice for a model of healthcare spending or claim amounts. The gamma distribution can be used in time to event models, but those aren't applicable here.
            $endgroup$
            – Weiwen Ng
            Sep 30 at 19:35










          • $begingroup$
            Thanks!! This was very helpful.
            $endgroup$
            – Vicki B
            Oct 1 at 0:07













          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "65"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );














          draft saved

          draft discarded
















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f429271%2fwhy-would-they-pick-a-gamma-distribution-here%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown


























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          27

















          $begingroup$

          When you're considering simple parametric models for the conditional distribution of data (i.e. the distribution of each group, or the expected distribution for each combination of predictor variables), and you are dealing with a positive continuous distribution, the two common choices are Gamma and log-Normal. Besides satisfying the specification of the domain of the distribution (real numbers greater than zero), these distributions are computationally convenient and often make mechanistic sense.



          • The log-Normal distribution is easily derived by exponentiating a Normal distribution (conversely, log-transforming log-Normal deviates gives Normal deviates). From a mechanistic point of view, the log-Normal arises via the Central Limit Theorem when each observation reflects the product of a large number of iid random variables. Once you've log-transformed the data, you have access to a huge variety of computational and analytical tools (e.g., anything assuming Normality or using least-squares methods).

          • As your question points out, one way that a Gamma distribution arises is as the distribution of waiting times until $n$ independent events with a constant waiting time $lambda$ occur. I can't easily find a reference for a mechanistic model of Gamma distributions of insurance claims, but it also makes sense to use a Gamma distribution from a phenomenological (i.e., data description/computational convenience) point of view. The Gamma distribution is part of the exponential family (which includes the Normal but not the log-Normal), which means that all of the machinery of generalized linear models is available; it also has a particularly convenient form for analysis.

          There are other reasons one might pick one or the other - for example, the "heaviness" of the tail of the distribution, which might be important in predicting the frequency of extreme events. There are plenty of other positive, continuous distributions (e.g see this list), but they tend to be used in more specialized applications.



          Very few of these distributions will capture the multi-modality you see in the marginal distributions above, but multi-modality may be explained by the data being grouped into categories described by observed categorical predictors. If there are no observable predictors that explain the multimodality, one might choose to fit a finite mixture model based on a mixture of a (small, discrete) number of positive continuous distributions.






          share|cite|improve this answer












          $endgroup$









          • 1




            $begingroup$
            also worth noting that gamma and lognormal models give almost always very similar results
            $endgroup$
            – carlo
            Sep 30 at 15:18






          • 2




            $begingroup$
            I work in health services research. I can confirm that in general, a gamma or lognormal distribution would be an appropriate choice for a model of healthcare spending or claim amounts. The gamma distribution can be used in time to event models, but those aren't applicable here.
            $endgroup$
            – Weiwen Ng
            Sep 30 at 19:35










          • $begingroup$
            Thanks!! This was very helpful.
            $endgroup$
            – Vicki B
            Oct 1 at 0:07
















          27

















          $begingroup$

          When you're considering simple parametric models for the conditional distribution of data (i.e. the distribution of each group, or the expected distribution for each combination of predictor variables), and you are dealing with a positive continuous distribution, the two common choices are Gamma and log-Normal. Besides satisfying the specification of the domain of the distribution (real numbers greater than zero), these distributions are computationally convenient and often make mechanistic sense.



          • The log-Normal distribution is easily derived by exponentiating a Normal distribution (conversely, log-transforming log-Normal deviates gives Normal deviates). From a mechanistic point of view, the log-Normal arises via the Central Limit Theorem when each observation reflects the product of a large number of iid random variables. Once you've log-transformed the data, you have access to a huge variety of computational and analytical tools (e.g., anything assuming Normality or using least-squares methods).

          • As your question points out, one way that a Gamma distribution arises is as the distribution of waiting times until $n$ independent events with a constant waiting time $lambda$ occur. I can't easily find a reference for a mechanistic model of Gamma distributions of insurance claims, but it also makes sense to use a Gamma distribution from a phenomenological (i.e., data description/computational convenience) point of view. The Gamma distribution is part of the exponential family (which includes the Normal but not the log-Normal), which means that all of the machinery of generalized linear models is available; it also has a particularly convenient form for analysis.

          There are other reasons one might pick one or the other - for example, the "heaviness" of the tail of the distribution, which might be important in predicting the frequency of extreme events. There are plenty of other positive, continuous distributions (e.g see this list), but they tend to be used in more specialized applications.



          Very few of these distributions will capture the multi-modality you see in the marginal distributions above, but multi-modality may be explained by the data being grouped into categories described by observed categorical predictors. If there are no observable predictors that explain the multimodality, one might choose to fit a finite mixture model based on a mixture of a (small, discrete) number of positive continuous distributions.






          share|cite|improve this answer












          $endgroup$









          • 1




            $begingroup$
            also worth noting that gamma and lognormal models give almost always very similar results
            $endgroup$
            – carlo
            Sep 30 at 15:18






          • 2




            $begingroup$
            I work in health services research. I can confirm that in general, a gamma or lognormal distribution would be an appropriate choice for a model of healthcare spending or claim amounts. The gamma distribution can be used in time to event models, but those aren't applicable here.
            $endgroup$
            – Weiwen Ng
            Sep 30 at 19:35










          • $begingroup$
            Thanks!! This was very helpful.
            $endgroup$
            – Vicki B
            Oct 1 at 0:07














          27















          27











          27







          $begingroup$

          When you're considering simple parametric models for the conditional distribution of data (i.e. the distribution of each group, or the expected distribution for each combination of predictor variables), and you are dealing with a positive continuous distribution, the two common choices are Gamma and log-Normal. Besides satisfying the specification of the domain of the distribution (real numbers greater than zero), these distributions are computationally convenient and often make mechanistic sense.



          • The log-Normal distribution is easily derived by exponentiating a Normal distribution (conversely, log-transforming log-Normal deviates gives Normal deviates). From a mechanistic point of view, the log-Normal arises via the Central Limit Theorem when each observation reflects the product of a large number of iid random variables. Once you've log-transformed the data, you have access to a huge variety of computational and analytical tools (e.g., anything assuming Normality or using least-squares methods).

          • As your question points out, one way that a Gamma distribution arises is as the distribution of waiting times until $n$ independent events with a constant waiting time $lambda$ occur. I can't easily find a reference for a mechanistic model of Gamma distributions of insurance claims, but it also makes sense to use a Gamma distribution from a phenomenological (i.e., data description/computational convenience) point of view. The Gamma distribution is part of the exponential family (which includes the Normal but not the log-Normal), which means that all of the machinery of generalized linear models is available; it also has a particularly convenient form for analysis.

          There are other reasons one might pick one or the other - for example, the "heaviness" of the tail of the distribution, which might be important in predicting the frequency of extreme events. There are plenty of other positive, continuous distributions (e.g see this list), but they tend to be used in more specialized applications.



          Very few of these distributions will capture the multi-modality you see in the marginal distributions above, but multi-modality may be explained by the data being grouped into categories described by observed categorical predictors. If there are no observable predictors that explain the multimodality, one might choose to fit a finite mixture model based on a mixture of a (small, discrete) number of positive continuous distributions.






          share|cite|improve this answer












          $endgroup$



          When you're considering simple parametric models for the conditional distribution of data (i.e. the distribution of each group, or the expected distribution for each combination of predictor variables), and you are dealing with a positive continuous distribution, the two common choices are Gamma and log-Normal. Besides satisfying the specification of the domain of the distribution (real numbers greater than zero), these distributions are computationally convenient and often make mechanistic sense.



          • The log-Normal distribution is easily derived by exponentiating a Normal distribution (conversely, log-transforming log-Normal deviates gives Normal deviates). From a mechanistic point of view, the log-Normal arises via the Central Limit Theorem when each observation reflects the product of a large number of iid random variables. Once you've log-transformed the data, you have access to a huge variety of computational and analytical tools (e.g., anything assuming Normality or using least-squares methods).

          • As your question points out, one way that a Gamma distribution arises is as the distribution of waiting times until $n$ independent events with a constant waiting time $lambda$ occur. I can't easily find a reference for a mechanistic model of Gamma distributions of insurance claims, but it also makes sense to use a Gamma distribution from a phenomenological (i.e., data description/computational convenience) point of view. The Gamma distribution is part of the exponential family (which includes the Normal but not the log-Normal), which means that all of the machinery of generalized linear models is available; it also has a particularly convenient form for analysis.

          There are other reasons one might pick one or the other - for example, the "heaviness" of the tail of the distribution, which might be important in predicting the frequency of extreme events. There are plenty of other positive, continuous distributions (e.g see this list), but they tend to be used in more specialized applications.



          Very few of these distributions will capture the multi-modality you see in the marginal distributions above, but multi-modality may be explained by the data being grouped into categories described by observed categorical predictors. If there are no observable predictors that explain the multimodality, one might choose to fit a finite mixture model based on a mixture of a (small, discrete) number of positive continuous distributions.







          share|cite|improve this answer















          share|cite|improve this answer




          share|cite|improve this answer








          edited Sep 30 at 0:51

























          answered Sep 29 at 23:46









          Ben BolkerBen Bolker

          27.1k2 gold badges75 silver badges104 bronze badges




          27.1k2 gold badges75 silver badges104 bronze badges










          • 1




            $begingroup$
            also worth noting that gamma and lognormal models give almost always very similar results
            $endgroup$
            – carlo
            Sep 30 at 15:18






          • 2




            $begingroup$
            I work in health services research. I can confirm that in general, a gamma or lognormal distribution would be an appropriate choice for a model of healthcare spending or claim amounts. The gamma distribution can be used in time to event models, but those aren't applicable here.
            $endgroup$
            – Weiwen Ng
            Sep 30 at 19:35










          • $begingroup$
            Thanks!! This was very helpful.
            $endgroup$
            – Vicki B
            Oct 1 at 0:07













          • 1




            $begingroup$
            also worth noting that gamma and lognormal models give almost always very similar results
            $endgroup$
            – carlo
            Sep 30 at 15:18






          • 2




            $begingroup$
            I work in health services research. I can confirm that in general, a gamma or lognormal distribution would be an appropriate choice for a model of healthcare spending or claim amounts. The gamma distribution can be used in time to event models, but those aren't applicable here.
            $endgroup$
            – Weiwen Ng
            Sep 30 at 19:35










          • $begingroup$
            Thanks!! This was very helpful.
            $endgroup$
            – Vicki B
            Oct 1 at 0:07








          1




          1




          $begingroup$
          also worth noting that gamma and lognormal models give almost always very similar results
          $endgroup$
          – carlo
          Sep 30 at 15:18




          $begingroup$
          also worth noting that gamma and lognormal models give almost always very similar results
          $endgroup$
          – carlo
          Sep 30 at 15:18




          2




          2




          $begingroup$
          I work in health services research. I can confirm that in general, a gamma or lognormal distribution would be an appropriate choice for a model of healthcare spending or claim amounts. The gamma distribution can be used in time to event models, but those aren't applicable here.
          $endgroup$
          – Weiwen Ng
          Sep 30 at 19:35




          $begingroup$
          I work in health services research. I can confirm that in general, a gamma or lognormal distribution would be an appropriate choice for a model of healthcare spending or claim amounts. The gamma distribution can be used in time to event models, but those aren't applicable here.
          $endgroup$
          – Weiwen Ng
          Sep 30 at 19:35












          $begingroup$
          Thanks!! This was very helpful.
          $endgroup$
          – Vicki B
          Oct 1 at 0:07





          $begingroup$
          Thanks!! This was very helpful.
          $endgroup$
          – Vicki B
          Oct 1 at 0:07



















          draft saved

          draft discarded















































          Thanks for contributing an answer to Cross Validated!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f429271%2fwhy-would-they-pick-a-gamma-distribution-here%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown









          Popular posts from this blog

          Distance measures on a map of a game The 2019 Stack Overflow Developer Survey Results Are Inmin distance in a graphShortest distance path on contour plotHow to plot a tilted map?Finding points outside of a diskDelaunay link distanceAnnulus from GeoDisks: drawing a ring on a mapNegative Correlation DistanceFind distance along a path (GPS coordinates)Finding position at given distance in a GeoPathMathematics behind distance estimation using camera

          How to get a smooth, uniform ParametricPlot of a 2D Region?How to plot a complicated Region?How to exclude a region from ParametricPlotHow discretize a region placing vertices on a specific non-uniform gridHow to transform a Plot or a ParametricPlot into a RegionHow can I get a smooth plot of a bounded region?Smooth ParametricPlot3D with RegionFunction?Smooth border of a region ParametricPlotSmooth region boundarySmooth region plot from list of pointsGet minimum y of a certain x in a region

          Genealogie vun de Merowenger Vum Merowech bis zum Chilperich I. | Navigatiounsmenü