How to fit a linear model in the Bayesian way in Mathematica?Is there a package for Bayesian Linear Regression?Non linear Model Fit (surface energy)How do I find out what a Mathematica function is doing (which numerical method it uses and how)?Non Linear Model Fit (Division by Zero Error Message)MLE model fit with GeneralizedLinearModelFitNon-linear-Model-Fit problem in mathematicahow to find linear fit

Possible executive assistant job scam

How to reduce thousand of faces of an already low poly object

How do I most effectively serve as group treasurer?

Is there any verse in the Rig Veda Samhita that says only Indra existed before creation?

MS in Mathematics, having trouble finding work outside teaching algebra

Is the NOLOCK hint completely safe to use on a table that's guaranteed not to change?

Do rainbows show spectral lines from the sun?

Nested CiviEvents

Cooking octopus: simple boil or broth?

Why don't miners charge more for high value transactions?

Being flown out for an interview, is it ok to ask to stay a while longer to check out the area?

An employer is trying to force me to switch banks, which I know is illegal. What should I do?

Open problems from antiquity solved with analytic geometry

Finding price of the power option

Forming error messages from a multidimensional associative array

A professor commented that my research is too simple as compared to my colleagues. What does that mean about my future prospects?

Ask Google to remove thousands of pages from its index after cleaning up from hacked site

What is the purpose behind a glass nozzle?

SD Card speed degrading and doesn't work on one of my cameras: can I do something?

How to get out of the Ice Palace in Zelda A link to the Past?

VBA Debugging - step through main program, but run routines called from it?

Does Enthrall make targets require perception checks to perceive anything other than the caster?

Does anyone know who created or where this He-Man and Battlecat image came from?

Puzzle Hunt 01: A cliched treasure map



How to fit a linear model in the Bayesian way in Mathematica?


Is there a package for Bayesian Linear Regression?Non linear Model Fit (surface energy)How do I find out what a Mathematica function is doing (which numerical method it uses and how)?Non Linear Model Fit (Division by Zero Error Message)MLE model fit with GeneralizedLinearModelFitNon-linear-Model-Fit problem in mathematicahow to find linear fit






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;

.everyonelovesstackoverflowposition:absolute;height:1px;width:1px;opacity:0;top:0;left:0;pointer-events:none;








34














$begingroup$


Basically, I'm looking for the Bayesian equivalent of LinearModelFit. As of the moment of writing, Mathematica has no real (documented) built-in
functionality for Bayesian fitting of data, but for linear regression there exist closed-form solutions for the posterior coefficient distributions and the posterior predictive distributions. Have these been implemented somewhere?










share|improve this question










$endgroup$














  • $begingroup$
    It might be of interest to look at the option FitRegularization in FindFit
    $endgroup$
    – chris
    Jul 14 at 15:18











  • $begingroup$
    @chris I'm aware of the new fit regularization options. In particular, fitting with L2 regularization can be considered as an approximation to a MAP estimate with the appropriate priors. BayesianLinearRegression is meant to implement the full Bayesian treatment, not an approximation.
    $endgroup$
    – Sjoerd Smit
    Jul 14 at 15:26











  • $begingroup$
    OK I am not trying to lower the interest of what you did. I believe MAP falls within the Bayesian framework. If I use say a Gaussian prior, the MAP give an exact solution, not an approximation. I guess this is just semantics though.
    $endgroup$
    – chris
    Jul 14 at 15:39






  • 2




    $begingroup$
    @chris It depends on what you call "the solution". I you ask me, the "solution" in a Bayesian setting is always a distribution, not a value or point estimate. A MAP estimate is a reduction of said distribution to a single value, which amounts to throwing away information. However, if you want to make a MAP estimate using my code, that's easy: all of the distributions are there. Just reduce them to numbers any way you want. Mean, median, mode: your choice.
    $endgroup$
    – Sjoerd Smit
    Jul 14 at 15:44











  • $begingroup$
    For Gaussian noise with a Gaussian prior, the MAP solution is a description of the Gaussian posterior, which is an exhaustive description of the corresponding Distribution. In any case I think people reading your post would also be interested in knowing that Mathematica now has FitRegularization now built in. The rest is linguistics IMHO.
    $endgroup$
    – chris
    Jul 14 at 16:53


















34














$begingroup$


Basically, I'm looking for the Bayesian equivalent of LinearModelFit. As of the moment of writing, Mathematica has no real (documented) built-in
functionality for Bayesian fitting of data, but for linear regression there exist closed-form solutions for the posterior coefficient distributions and the posterior predictive distributions. Have these been implemented somewhere?










share|improve this question










$endgroup$














  • $begingroup$
    It might be of interest to look at the option FitRegularization in FindFit
    $endgroup$
    – chris
    Jul 14 at 15:18











  • $begingroup$
    @chris I'm aware of the new fit regularization options. In particular, fitting with L2 regularization can be considered as an approximation to a MAP estimate with the appropriate priors. BayesianLinearRegression is meant to implement the full Bayesian treatment, not an approximation.
    $endgroup$
    – Sjoerd Smit
    Jul 14 at 15:26











  • $begingroup$
    OK I am not trying to lower the interest of what you did. I believe MAP falls within the Bayesian framework. If I use say a Gaussian prior, the MAP give an exact solution, not an approximation. I guess this is just semantics though.
    $endgroup$
    – chris
    Jul 14 at 15:39






  • 2




    $begingroup$
    @chris It depends on what you call "the solution". I you ask me, the "solution" in a Bayesian setting is always a distribution, not a value or point estimate. A MAP estimate is a reduction of said distribution to a single value, which amounts to throwing away information. However, if you want to make a MAP estimate using my code, that's easy: all of the distributions are there. Just reduce them to numbers any way you want. Mean, median, mode: your choice.
    $endgroup$
    – Sjoerd Smit
    Jul 14 at 15:44











  • $begingroup$
    For Gaussian noise with a Gaussian prior, the MAP solution is a description of the Gaussian posterior, which is an exhaustive description of the corresponding Distribution. In any case I think people reading your post would also be interested in knowing that Mathematica now has FitRegularization now built in. The rest is linguistics IMHO.
    $endgroup$
    – chris
    Jul 14 at 16:53














34












34








34


18



$begingroup$


Basically, I'm looking for the Bayesian equivalent of LinearModelFit. As of the moment of writing, Mathematica has no real (documented) built-in
functionality for Bayesian fitting of data, but for linear regression there exist closed-form solutions for the posterior coefficient distributions and the posterior predictive distributions. Have these been implemented somewhere?










share|improve this question










$endgroup$




Basically, I'm looking for the Bayesian equivalent of LinearModelFit. As of the moment of writing, Mathematica has no real (documented) built-in
functionality for Bayesian fitting of data, but for linear regression there exist closed-form solutions for the posterior coefficient distributions and the posterior predictive distributions. Have these been implemented somewhere?







probability-or-statistics fitting linear-algebra bayesian






share|improve this question














share|improve this question











share|improve this question




share|improve this question










asked Jul 14 at 14:34









Sjoerd SmitSjoerd Smit

7,26016 silver badges28 bronze badges




7,26016 silver badges28 bronze badges














  • $begingroup$
    It might be of interest to look at the option FitRegularization in FindFit
    $endgroup$
    – chris
    Jul 14 at 15:18











  • $begingroup$
    @chris I'm aware of the new fit regularization options. In particular, fitting with L2 regularization can be considered as an approximation to a MAP estimate with the appropriate priors. BayesianLinearRegression is meant to implement the full Bayesian treatment, not an approximation.
    $endgroup$
    – Sjoerd Smit
    Jul 14 at 15:26











  • $begingroup$
    OK I am not trying to lower the interest of what you did. I believe MAP falls within the Bayesian framework. If I use say a Gaussian prior, the MAP give an exact solution, not an approximation. I guess this is just semantics though.
    $endgroup$
    – chris
    Jul 14 at 15:39






  • 2




    $begingroup$
    @chris It depends on what you call "the solution". I you ask me, the "solution" in a Bayesian setting is always a distribution, not a value or point estimate. A MAP estimate is a reduction of said distribution to a single value, which amounts to throwing away information. However, if you want to make a MAP estimate using my code, that's easy: all of the distributions are there. Just reduce them to numbers any way you want. Mean, median, mode: your choice.
    $endgroup$
    – Sjoerd Smit
    Jul 14 at 15:44











  • $begingroup$
    For Gaussian noise with a Gaussian prior, the MAP solution is a description of the Gaussian posterior, which is an exhaustive description of the corresponding Distribution. In any case I think people reading your post would also be interested in knowing that Mathematica now has FitRegularization now built in. The rest is linguistics IMHO.
    $endgroup$
    – chris
    Jul 14 at 16:53

















  • $begingroup$
    It might be of interest to look at the option FitRegularization in FindFit
    $endgroup$
    – chris
    Jul 14 at 15:18











  • $begingroup$
    @chris I'm aware of the new fit regularization options. In particular, fitting with L2 regularization can be considered as an approximation to a MAP estimate with the appropriate priors. BayesianLinearRegression is meant to implement the full Bayesian treatment, not an approximation.
    $endgroup$
    – Sjoerd Smit
    Jul 14 at 15:26











  • $begingroup$
    OK I am not trying to lower the interest of what you did. I believe MAP falls within the Bayesian framework. If I use say a Gaussian prior, the MAP give an exact solution, not an approximation. I guess this is just semantics though.
    $endgroup$
    – chris
    Jul 14 at 15:39






  • 2




    $begingroup$
    @chris It depends on what you call "the solution". I you ask me, the "solution" in a Bayesian setting is always a distribution, not a value or point estimate. A MAP estimate is a reduction of said distribution to a single value, which amounts to throwing away information. However, if you want to make a MAP estimate using my code, that's easy: all of the distributions are there. Just reduce them to numbers any way you want. Mean, median, mode: your choice.
    $endgroup$
    – Sjoerd Smit
    Jul 14 at 15:44











  • $begingroup$
    For Gaussian noise with a Gaussian prior, the MAP solution is a description of the Gaussian posterior, which is an exhaustive description of the corresponding Distribution. In any case I think people reading your post would also be interested in knowing that Mathematica now has FitRegularization now built in. The rest is linguistics IMHO.
    $endgroup$
    – chris
    Jul 14 at 16:53
















$begingroup$
It might be of interest to look at the option FitRegularization in FindFit
$endgroup$
– chris
Jul 14 at 15:18





$begingroup$
It might be of interest to look at the option FitRegularization in FindFit
$endgroup$
– chris
Jul 14 at 15:18













$begingroup$
@chris I'm aware of the new fit regularization options. In particular, fitting with L2 regularization can be considered as an approximation to a MAP estimate with the appropriate priors. BayesianLinearRegression is meant to implement the full Bayesian treatment, not an approximation.
$endgroup$
– Sjoerd Smit
Jul 14 at 15:26





$begingroup$
@chris I'm aware of the new fit regularization options. In particular, fitting with L2 regularization can be considered as an approximation to a MAP estimate with the appropriate priors. BayesianLinearRegression is meant to implement the full Bayesian treatment, not an approximation.
$endgroup$
– Sjoerd Smit
Jul 14 at 15:26













$begingroup$
OK I am not trying to lower the interest of what you did. I believe MAP falls within the Bayesian framework. If I use say a Gaussian prior, the MAP give an exact solution, not an approximation. I guess this is just semantics though.
$endgroup$
– chris
Jul 14 at 15:39




$begingroup$
OK I am not trying to lower the interest of what you did. I believe MAP falls within the Bayesian framework. If I use say a Gaussian prior, the MAP give an exact solution, not an approximation. I guess this is just semantics though.
$endgroup$
– chris
Jul 14 at 15:39




2




2




$begingroup$
@chris It depends on what you call "the solution". I you ask me, the "solution" in a Bayesian setting is always a distribution, not a value or point estimate. A MAP estimate is a reduction of said distribution to a single value, which amounts to throwing away information. However, if you want to make a MAP estimate using my code, that's easy: all of the distributions are there. Just reduce them to numbers any way you want. Mean, median, mode: your choice.
$endgroup$
– Sjoerd Smit
Jul 14 at 15:44





$begingroup$
@chris It depends on what you call "the solution". I you ask me, the "solution" in a Bayesian setting is always a distribution, not a value or point estimate. A MAP estimate is a reduction of said distribution to a single value, which amounts to throwing away information. However, if you want to make a MAP estimate using my code, that's easy: all of the distributions are there. Just reduce them to numbers any way you want. Mean, median, mode: your choice.
$endgroup$
– Sjoerd Smit
Jul 14 at 15:44













$begingroup$
For Gaussian noise with a Gaussian prior, the MAP solution is a description of the Gaussian posterior, which is an exhaustive description of the corresponding Distribution. In any case I think people reading your post would also be interested in knowing that Mathematica now has FitRegularization now built in. The rest is linguistics IMHO.
$endgroup$
– chris
Jul 14 at 16:53





$begingroup$
For Gaussian noise with a Gaussian prior, the MAP solution is a description of the Gaussian posterior, which is an exhaustive description of the corresponding Distribution. In any case I think people reading your post would also be interested in knowing that Mathematica now has FitRegularization now built in. The rest is linguistics IMHO.
$endgroup$
– chris
Jul 14 at 16:53











1 Answer
1






active

oldest

votes


















42
















$begingroup$

I submitted this question to answer it myself, since I recently updated my Bayesian inference repository on GitHub with a function called BayesianLinearRegression that does just this. I wrote a general introduction to its functionalities on the Wolfram Community and the example notebook on GitHub shows some more advanced uses of the function. I also submitted the function to the Wolfram function repository and you can use this version by replacing BayesianLinearRegression with ResourceFunction["BayesianLinearRegression"] in the examples below (or just evaluating BayesianLinearRegression = ResourceFunction["BayesianLinearRegression"] once).



Please refer to the README.md file (shown on the front page of the repository link) for instructions on the installation of the BayesianInference package. If you don't want the whole package, you can also get the code for BayesianLinearRegression directly from the relevant package file.



Example of use



BayesianLinearRegression uses the same syntax as LinearModelFit. In addition, it also supports Rule-based definitions of input-output data as used by Predict (i.e., data of the form x1 -> y1, ... or x1, x2, ... -> y1, y2, .... This format is particularly useful for multivariate regression (i.e., when the y values are vectors), which is also supported by BayesianLinearRegression.



The output of the function is an Association with all relevant information about the fit.



data = 
-1.5`,-1.375`,-1.34375`,-2.375`,1.5`,0.21875`, 1.03125`,0.6875`,-0.5`,-0.59375`, -1.875`,-2.59375`,1.625`,1.1875`,
-2.0625`,-1.875`,1.0625`,0.5`,-0.4375`,-0.28125`,-0.75`,-0.75`,2.125`,0.375`,0.4375`,0.6875`,-1.3125`,-0.75`,-1.125`,-0.21875`,
0.625`,0.40625`,-0.25`,0.59375`,-1.875`,-1.625`,-1.`,-0.8125`,0.4375`,-0.09375`
;
Clear[x];
model = BayesianLinearRegression[data, 1, x, x];
Keys[model]



Out[21]= "LogEvidence", "PriorParameters", "PosteriorParameters", "Posterior", "Prior", "Basis", "IndependentVariables"




The posterior predictive distribution is specified as an x-dependent probability distribution:



model["Posterior", "PredictiveDistribution"]



Out[15]= StudentTDistribution[-0.248878 + 0.714688 x, 0.555877 Sqrt[1.05211 + 0.0164952 x + 0.031814 x^2], 2001/100]




Show the single prediction bands:



With[
predictiveDist = model["Posterior", "PredictiveDistribution"],
bands = 95, 50, 5
,
Show[
Plot[
Evaluate@InverseCDF[predictiveDist, bands/100],
x, -4, 4, Filling -> 1 -> 2, 3 -> 2, PlotLegends -> bands
],
ListPlot[data],
PlotRange -> All
]
]


Prediction bands of 1st degree fit



It looks like the data could also be fitted with a quadratic fit. To test this, compute the log-evidence for polynomial models up to degree 4 and rank them:



In[152]:= models = AssociationMap[
BayesianLinearRegression[Rule @@@ data, x^Range[0, #], x] &,
Range[0, 4]
];
ReverseSort @ models[[All, "LogEvidence"]]



Out[153]= <|1 -> -30.0072, 2 -> -30.1774, 3 -> -34.4292, 4 -> -38.7037, 0 -> -38.787|>




The evidences for the first and second degree models are almost equal. In this case, it may be more appropriate to define a mixture of the models with weights derived from their evidences:



 weights = Normalize[
(* subtract the max to reduce rounding error *)
Exp[models[[All, "LogEvidence"]] - Max[models[[All, "LogEvidence"]]]],
Total
];
mixDist = MixtureDistribution[
Values[weights],
Values @ models[[All, "Posterior", "PredictiveDistribution"]]
];
Show[
(*regressionPlot1D is a utility function from the package*)
regressionPlot1D[mixDist, x, -3, 3],
ListPlot[data]
]


Prediction bands of a mixture model of polynomials up to degree 4



As you can see, the mixture model shows features of both the first and second degree fits to the data.



Please refer to the example notebook for information about specification of priors (see the section "Options of BayesianLinearRegression" -> "PriorParameters") and multivariate regression.



Detailed explanation about the returned values



For purposes of illustration, consider a simple model like:



y == a + b x + eps


with eps distributed as NormalDistribution[0, sigma]. This model is fitted with BayesianLinearRegression[data, 1, x, x]. Here is an explanation of the keys in the returned Association:



  • "LogEvidence": In a Bayesian setting, the evidence (also called
    marginal likelihood) measures how well the model fits the data (with
    a higher evidence indicating a better fit). The evidence has the
    virtue that it naturally penalizes models for their complexity and
    therefore does not suffer from over-fitting in the way that measures
    like the sum-of-squares or (log-)likelihood do.


  • "Basis", "IndependentVariables": Simply the basis functions and independent variable specified by the user.



  • "Posterior", "Prior": These two keys each hold an association with 4 distributions:



    1. "PredictiveDistribution": A distribution that depends on the
      independent variables (x in the example above). By filling in a value
      for x, you get a distribution that tells you where you could expect
      to find future y values. This distribution accounts for all relevant
      uncertainties in the model: model variance caused by the term
      eps; uncertainty in the values of a and b; and uncertainty in
      sigma.


    2. "UnderlyingValueDistribution": Similar to "PredictiveDistribution", but this distribution give the possible values of a + b x without the eps error term.


    3. "RegressionCoefficientDistribution": The join distribution over a and b.


    4. "ErrorDistribution": The distribution of the variance sigma^2.



  • "PriorParameters", "PosteriorParameters": These parameters are not immediately important most of the time, but they contain all of the relevant information about the fit.


People familiar with Bayesian analysis may note that one distribution is absent: the full joint distribution over a, b and sigma all together (I only gave the marginals over a and b on one hand and sigma on the other). This is because Mathematica doesn't really offer a convenient framework for representing this distribution, unfortunately.



Sources of formulas used:



  • https://en.wikipedia.org/wiki/Bayesian_linear_regression

  • https://en.wikipedia.org/wiki/Bayesian_multivariate_linear_regression





share|improve this answer












$endgroup$














  • $begingroup$
    Awesome! Is there a way to set the prior? What prior did you use by default?
    $endgroup$
    – Roman
    Jul 14 at 14:36











  • $begingroup$
    @Roman Yes you can: "Please refer to the example notebook for information about specification of priors" ;) (I updated that sentence with the section where you can find it)
    $endgroup$
    – Sjoerd Smit
    Jul 14 at 14:37











  • $begingroup$
    Nice. I believe version 12 provides penalty in all fitting routines of mathematica? This is equivalent to maximum a posteriori where exp(penalty) represents the prior?
    $endgroup$
    – chris
    Jul 14 at 15:16






  • 3




    $begingroup$
    @chris Functions like LinearModelFit compute the Akaike Information Criterion and Bayesian Information Criterion. The serve a similar purpose, but are quite different from the Bayesian evidence. Furthermore, BayesianLinearRegression does not make point estimates (such as MAP) but retains the full distribution over the fit coefficients. Don't throw away information if you don't have to ;).
    $endgroup$
    – Sjoerd Smit
    Jul 14 at 15:23






  • 2




    $begingroup$
    Thank you for doing this work and sharing it with the community. I do find it frustrating that Wolfram has not made greater efforts to facilitate the use the Bayesian paradigm and statistical models for those users, such as myself, who are not capable of building/programming these models/implementations themselves; but nonetheless are able and desirous of following along with the competent and professional implementations of others.
    $endgroup$
    – BeanSith
    Jul 17 at 3:12












Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "387"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);














draft saved

draft discarded
















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f202065%2fhow-to-fit-a-linear-model-in-the-bayesian-way-in-mathematica%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown


























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









42
















$begingroup$

I submitted this question to answer it myself, since I recently updated my Bayesian inference repository on GitHub with a function called BayesianLinearRegression that does just this. I wrote a general introduction to its functionalities on the Wolfram Community and the example notebook on GitHub shows some more advanced uses of the function. I also submitted the function to the Wolfram function repository and you can use this version by replacing BayesianLinearRegression with ResourceFunction["BayesianLinearRegression"] in the examples below (or just evaluating BayesianLinearRegression = ResourceFunction["BayesianLinearRegression"] once).



Please refer to the README.md file (shown on the front page of the repository link) for instructions on the installation of the BayesianInference package. If you don't want the whole package, you can also get the code for BayesianLinearRegression directly from the relevant package file.



Example of use



BayesianLinearRegression uses the same syntax as LinearModelFit. In addition, it also supports Rule-based definitions of input-output data as used by Predict (i.e., data of the form x1 -> y1, ... or x1, x2, ... -> y1, y2, .... This format is particularly useful for multivariate regression (i.e., when the y values are vectors), which is also supported by BayesianLinearRegression.



The output of the function is an Association with all relevant information about the fit.



data = 
-1.5`,-1.375`,-1.34375`,-2.375`,1.5`,0.21875`, 1.03125`,0.6875`,-0.5`,-0.59375`, -1.875`,-2.59375`,1.625`,1.1875`,
-2.0625`,-1.875`,1.0625`,0.5`,-0.4375`,-0.28125`,-0.75`,-0.75`,2.125`,0.375`,0.4375`,0.6875`,-1.3125`,-0.75`,-1.125`,-0.21875`,
0.625`,0.40625`,-0.25`,0.59375`,-1.875`,-1.625`,-1.`,-0.8125`,0.4375`,-0.09375`
;
Clear[x];
model = BayesianLinearRegression[data, 1, x, x];
Keys[model]



Out[21]= "LogEvidence", "PriorParameters", "PosteriorParameters", "Posterior", "Prior", "Basis", "IndependentVariables"




The posterior predictive distribution is specified as an x-dependent probability distribution:



model["Posterior", "PredictiveDistribution"]



Out[15]= StudentTDistribution[-0.248878 + 0.714688 x, 0.555877 Sqrt[1.05211 + 0.0164952 x + 0.031814 x^2], 2001/100]




Show the single prediction bands:



With[
predictiveDist = model["Posterior", "PredictiveDistribution"],
bands = 95, 50, 5
,
Show[
Plot[
Evaluate@InverseCDF[predictiveDist, bands/100],
x, -4, 4, Filling -> 1 -> 2, 3 -> 2, PlotLegends -> bands
],
ListPlot[data],
PlotRange -> All
]
]


Prediction bands of 1st degree fit



It looks like the data could also be fitted with a quadratic fit. To test this, compute the log-evidence for polynomial models up to degree 4 and rank them:



In[152]:= models = AssociationMap[
BayesianLinearRegression[Rule @@@ data, x^Range[0, #], x] &,
Range[0, 4]
];
ReverseSort @ models[[All, "LogEvidence"]]



Out[153]= <|1 -> -30.0072, 2 -> -30.1774, 3 -> -34.4292, 4 -> -38.7037, 0 -> -38.787|>




The evidences for the first and second degree models are almost equal. In this case, it may be more appropriate to define a mixture of the models with weights derived from their evidences:



 weights = Normalize[
(* subtract the max to reduce rounding error *)
Exp[models[[All, "LogEvidence"]] - Max[models[[All, "LogEvidence"]]]],
Total
];
mixDist = MixtureDistribution[
Values[weights],
Values @ models[[All, "Posterior", "PredictiveDistribution"]]
];
Show[
(*regressionPlot1D is a utility function from the package*)
regressionPlot1D[mixDist, x, -3, 3],
ListPlot[data]
]


Prediction bands of a mixture model of polynomials up to degree 4



As you can see, the mixture model shows features of both the first and second degree fits to the data.



Please refer to the example notebook for information about specification of priors (see the section "Options of BayesianLinearRegression" -> "PriorParameters") and multivariate regression.



Detailed explanation about the returned values



For purposes of illustration, consider a simple model like:



y == a + b x + eps


with eps distributed as NormalDistribution[0, sigma]. This model is fitted with BayesianLinearRegression[data, 1, x, x]. Here is an explanation of the keys in the returned Association:



  • "LogEvidence": In a Bayesian setting, the evidence (also called
    marginal likelihood) measures how well the model fits the data (with
    a higher evidence indicating a better fit). The evidence has the
    virtue that it naturally penalizes models for their complexity and
    therefore does not suffer from over-fitting in the way that measures
    like the sum-of-squares or (log-)likelihood do.


  • "Basis", "IndependentVariables": Simply the basis functions and independent variable specified by the user.



  • "Posterior", "Prior": These two keys each hold an association with 4 distributions:



    1. "PredictiveDistribution": A distribution that depends on the
      independent variables (x in the example above). By filling in a value
      for x, you get a distribution that tells you where you could expect
      to find future y values. This distribution accounts for all relevant
      uncertainties in the model: model variance caused by the term
      eps; uncertainty in the values of a and b; and uncertainty in
      sigma.


    2. "UnderlyingValueDistribution": Similar to "PredictiveDistribution", but this distribution give the possible values of a + b x without the eps error term.


    3. "RegressionCoefficientDistribution": The join distribution over a and b.


    4. "ErrorDistribution": The distribution of the variance sigma^2.



  • "PriorParameters", "PosteriorParameters": These parameters are not immediately important most of the time, but they contain all of the relevant information about the fit.


People familiar with Bayesian analysis may note that one distribution is absent: the full joint distribution over a, b and sigma all together (I only gave the marginals over a and b on one hand and sigma on the other). This is because Mathematica doesn't really offer a convenient framework for representing this distribution, unfortunately.



Sources of formulas used:



  • https://en.wikipedia.org/wiki/Bayesian_linear_regression

  • https://en.wikipedia.org/wiki/Bayesian_multivariate_linear_regression





share|improve this answer












$endgroup$














  • $begingroup$
    Awesome! Is there a way to set the prior? What prior did you use by default?
    $endgroup$
    – Roman
    Jul 14 at 14:36











  • $begingroup$
    @Roman Yes you can: "Please refer to the example notebook for information about specification of priors" ;) (I updated that sentence with the section where you can find it)
    $endgroup$
    – Sjoerd Smit
    Jul 14 at 14:37











  • $begingroup$
    Nice. I believe version 12 provides penalty in all fitting routines of mathematica? This is equivalent to maximum a posteriori where exp(penalty) represents the prior?
    $endgroup$
    – chris
    Jul 14 at 15:16






  • 3




    $begingroup$
    @chris Functions like LinearModelFit compute the Akaike Information Criterion and Bayesian Information Criterion. The serve a similar purpose, but are quite different from the Bayesian evidence. Furthermore, BayesianLinearRegression does not make point estimates (such as MAP) but retains the full distribution over the fit coefficients. Don't throw away information if you don't have to ;).
    $endgroup$
    – Sjoerd Smit
    Jul 14 at 15:23






  • 2




    $begingroup$
    Thank you for doing this work and sharing it with the community. I do find it frustrating that Wolfram has not made greater efforts to facilitate the use the Bayesian paradigm and statistical models for those users, such as myself, who are not capable of building/programming these models/implementations themselves; but nonetheless are able and desirous of following along with the competent and professional implementations of others.
    $endgroup$
    – BeanSith
    Jul 17 at 3:12















42
















$begingroup$

I submitted this question to answer it myself, since I recently updated my Bayesian inference repository on GitHub with a function called BayesianLinearRegression that does just this. I wrote a general introduction to its functionalities on the Wolfram Community and the example notebook on GitHub shows some more advanced uses of the function. I also submitted the function to the Wolfram function repository and you can use this version by replacing BayesianLinearRegression with ResourceFunction["BayesianLinearRegression"] in the examples below (or just evaluating BayesianLinearRegression = ResourceFunction["BayesianLinearRegression"] once).



Please refer to the README.md file (shown on the front page of the repository link) for instructions on the installation of the BayesianInference package. If you don't want the whole package, you can also get the code for BayesianLinearRegression directly from the relevant package file.



Example of use



BayesianLinearRegression uses the same syntax as LinearModelFit. In addition, it also supports Rule-based definitions of input-output data as used by Predict (i.e., data of the form x1 -> y1, ... or x1, x2, ... -> y1, y2, .... This format is particularly useful for multivariate regression (i.e., when the y values are vectors), which is also supported by BayesianLinearRegression.



The output of the function is an Association with all relevant information about the fit.



data = 
-1.5`,-1.375`,-1.34375`,-2.375`,1.5`,0.21875`, 1.03125`,0.6875`,-0.5`,-0.59375`, -1.875`,-2.59375`,1.625`,1.1875`,
-2.0625`,-1.875`,1.0625`,0.5`,-0.4375`,-0.28125`,-0.75`,-0.75`,2.125`,0.375`,0.4375`,0.6875`,-1.3125`,-0.75`,-1.125`,-0.21875`,
0.625`,0.40625`,-0.25`,0.59375`,-1.875`,-1.625`,-1.`,-0.8125`,0.4375`,-0.09375`
;
Clear[x];
model = BayesianLinearRegression[data, 1, x, x];
Keys[model]



Out[21]= "LogEvidence", "PriorParameters", "PosteriorParameters", "Posterior", "Prior", "Basis", "IndependentVariables"




The posterior predictive distribution is specified as an x-dependent probability distribution:



model["Posterior", "PredictiveDistribution"]



Out[15]= StudentTDistribution[-0.248878 + 0.714688 x, 0.555877 Sqrt[1.05211 + 0.0164952 x + 0.031814 x^2], 2001/100]




Show the single prediction bands:



With[
predictiveDist = model["Posterior", "PredictiveDistribution"],
bands = 95, 50, 5
,
Show[
Plot[
Evaluate@InverseCDF[predictiveDist, bands/100],
x, -4, 4, Filling -> 1 -> 2, 3 -> 2, PlotLegends -> bands
],
ListPlot[data],
PlotRange -> All
]
]


Prediction bands of 1st degree fit



It looks like the data could also be fitted with a quadratic fit. To test this, compute the log-evidence for polynomial models up to degree 4 and rank them:



In[152]:= models = AssociationMap[
BayesianLinearRegression[Rule @@@ data, x^Range[0, #], x] &,
Range[0, 4]
];
ReverseSort @ models[[All, "LogEvidence"]]



Out[153]= <|1 -> -30.0072, 2 -> -30.1774, 3 -> -34.4292, 4 -> -38.7037, 0 -> -38.787|>




The evidences for the first and second degree models are almost equal. In this case, it may be more appropriate to define a mixture of the models with weights derived from their evidences:



 weights = Normalize[
(* subtract the max to reduce rounding error *)
Exp[models[[All, "LogEvidence"]] - Max[models[[All, "LogEvidence"]]]],
Total
];
mixDist = MixtureDistribution[
Values[weights],
Values @ models[[All, "Posterior", "PredictiveDistribution"]]
];
Show[
(*regressionPlot1D is a utility function from the package*)
regressionPlot1D[mixDist, x, -3, 3],
ListPlot[data]
]


Prediction bands of a mixture model of polynomials up to degree 4



As you can see, the mixture model shows features of both the first and second degree fits to the data.



Please refer to the example notebook for information about specification of priors (see the section "Options of BayesianLinearRegression" -> "PriorParameters") and multivariate regression.



Detailed explanation about the returned values



For purposes of illustration, consider a simple model like:



y == a + b x + eps


with eps distributed as NormalDistribution[0, sigma]. This model is fitted with BayesianLinearRegression[data, 1, x, x]. Here is an explanation of the keys in the returned Association:



  • "LogEvidence": In a Bayesian setting, the evidence (also called
    marginal likelihood) measures how well the model fits the data (with
    a higher evidence indicating a better fit). The evidence has the
    virtue that it naturally penalizes models for their complexity and
    therefore does not suffer from over-fitting in the way that measures
    like the sum-of-squares or (log-)likelihood do.


  • "Basis", "IndependentVariables": Simply the basis functions and independent variable specified by the user.



  • "Posterior", "Prior": These two keys each hold an association with 4 distributions:



    1. "PredictiveDistribution": A distribution that depends on the
      independent variables (x in the example above). By filling in a value
      for x, you get a distribution that tells you where you could expect
      to find future y values. This distribution accounts for all relevant
      uncertainties in the model: model variance caused by the term
      eps; uncertainty in the values of a and b; and uncertainty in
      sigma.


    2. "UnderlyingValueDistribution": Similar to "PredictiveDistribution", but this distribution give the possible values of a + b x without the eps error term.


    3. "RegressionCoefficientDistribution": The join distribution over a and b.


    4. "ErrorDistribution": The distribution of the variance sigma^2.



  • "PriorParameters", "PosteriorParameters": These parameters are not immediately important most of the time, but they contain all of the relevant information about the fit.


People familiar with Bayesian analysis may note that one distribution is absent: the full joint distribution over a, b and sigma all together (I only gave the marginals over a and b on one hand and sigma on the other). This is because Mathematica doesn't really offer a convenient framework for representing this distribution, unfortunately.



Sources of formulas used:



  • https://en.wikipedia.org/wiki/Bayesian_linear_regression

  • https://en.wikipedia.org/wiki/Bayesian_multivariate_linear_regression





share|improve this answer












$endgroup$














  • $begingroup$
    Awesome! Is there a way to set the prior? What prior did you use by default?
    $endgroup$
    – Roman
    Jul 14 at 14:36











  • $begingroup$
    @Roman Yes you can: "Please refer to the example notebook for information about specification of priors" ;) (I updated that sentence with the section where you can find it)
    $endgroup$
    – Sjoerd Smit
    Jul 14 at 14:37











  • $begingroup$
    Nice. I believe version 12 provides penalty in all fitting routines of mathematica? This is equivalent to maximum a posteriori where exp(penalty) represents the prior?
    $endgroup$
    – chris
    Jul 14 at 15:16






  • 3




    $begingroup$
    @chris Functions like LinearModelFit compute the Akaike Information Criterion and Bayesian Information Criterion. The serve a similar purpose, but are quite different from the Bayesian evidence. Furthermore, BayesianLinearRegression does not make point estimates (such as MAP) but retains the full distribution over the fit coefficients. Don't throw away information if you don't have to ;).
    $endgroup$
    – Sjoerd Smit
    Jul 14 at 15:23






  • 2




    $begingroup$
    Thank you for doing this work and sharing it with the community. I do find it frustrating that Wolfram has not made greater efforts to facilitate the use the Bayesian paradigm and statistical models for those users, such as myself, who are not capable of building/programming these models/implementations themselves; but nonetheless are able and desirous of following along with the competent and professional implementations of others.
    $endgroup$
    – BeanSith
    Jul 17 at 3:12













42














42










42







$begingroup$

I submitted this question to answer it myself, since I recently updated my Bayesian inference repository on GitHub with a function called BayesianLinearRegression that does just this. I wrote a general introduction to its functionalities on the Wolfram Community and the example notebook on GitHub shows some more advanced uses of the function. I also submitted the function to the Wolfram function repository and you can use this version by replacing BayesianLinearRegression with ResourceFunction["BayesianLinearRegression"] in the examples below (or just evaluating BayesianLinearRegression = ResourceFunction["BayesianLinearRegression"] once).



Please refer to the README.md file (shown on the front page of the repository link) for instructions on the installation of the BayesianInference package. If you don't want the whole package, you can also get the code for BayesianLinearRegression directly from the relevant package file.



Example of use



BayesianLinearRegression uses the same syntax as LinearModelFit. In addition, it also supports Rule-based definitions of input-output data as used by Predict (i.e., data of the form x1 -> y1, ... or x1, x2, ... -> y1, y2, .... This format is particularly useful for multivariate regression (i.e., when the y values are vectors), which is also supported by BayesianLinearRegression.



The output of the function is an Association with all relevant information about the fit.



data = 
-1.5`,-1.375`,-1.34375`,-2.375`,1.5`,0.21875`, 1.03125`,0.6875`,-0.5`,-0.59375`, -1.875`,-2.59375`,1.625`,1.1875`,
-2.0625`,-1.875`,1.0625`,0.5`,-0.4375`,-0.28125`,-0.75`,-0.75`,2.125`,0.375`,0.4375`,0.6875`,-1.3125`,-0.75`,-1.125`,-0.21875`,
0.625`,0.40625`,-0.25`,0.59375`,-1.875`,-1.625`,-1.`,-0.8125`,0.4375`,-0.09375`
;
Clear[x];
model = BayesianLinearRegression[data, 1, x, x];
Keys[model]



Out[21]= "LogEvidence", "PriorParameters", "PosteriorParameters", "Posterior", "Prior", "Basis", "IndependentVariables"




The posterior predictive distribution is specified as an x-dependent probability distribution:



model["Posterior", "PredictiveDistribution"]



Out[15]= StudentTDistribution[-0.248878 + 0.714688 x, 0.555877 Sqrt[1.05211 + 0.0164952 x + 0.031814 x^2], 2001/100]




Show the single prediction bands:



With[
predictiveDist = model["Posterior", "PredictiveDistribution"],
bands = 95, 50, 5
,
Show[
Plot[
Evaluate@InverseCDF[predictiveDist, bands/100],
x, -4, 4, Filling -> 1 -> 2, 3 -> 2, PlotLegends -> bands
],
ListPlot[data],
PlotRange -> All
]
]


Prediction bands of 1st degree fit



It looks like the data could also be fitted with a quadratic fit. To test this, compute the log-evidence for polynomial models up to degree 4 and rank them:



In[152]:= models = AssociationMap[
BayesianLinearRegression[Rule @@@ data, x^Range[0, #], x] &,
Range[0, 4]
];
ReverseSort @ models[[All, "LogEvidence"]]



Out[153]= <|1 -> -30.0072, 2 -> -30.1774, 3 -> -34.4292, 4 -> -38.7037, 0 -> -38.787|>




The evidences for the first and second degree models are almost equal. In this case, it may be more appropriate to define a mixture of the models with weights derived from their evidences:



 weights = Normalize[
(* subtract the max to reduce rounding error *)
Exp[models[[All, "LogEvidence"]] - Max[models[[All, "LogEvidence"]]]],
Total
];
mixDist = MixtureDistribution[
Values[weights],
Values @ models[[All, "Posterior", "PredictiveDistribution"]]
];
Show[
(*regressionPlot1D is a utility function from the package*)
regressionPlot1D[mixDist, x, -3, 3],
ListPlot[data]
]


Prediction bands of a mixture model of polynomials up to degree 4



As you can see, the mixture model shows features of both the first and second degree fits to the data.



Please refer to the example notebook for information about specification of priors (see the section "Options of BayesianLinearRegression" -> "PriorParameters") and multivariate regression.



Detailed explanation about the returned values



For purposes of illustration, consider a simple model like:



y == a + b x + eps


with eps distributed as NormalDistribution[0, sigma]. This model is fitted with BayesianLinearRegression[data, 1, x, x]. Here is an explanation of the keys in the returned Association:



  • "LogEvidence": In a Bayesian setting, the evidence (also called
    marginal likelihood) measures how well the model fits the data (with
    a higher evidence indicating a better fit). The evidence has the
    virtue that it naturally penalizes models for their complexity and
    therefore does not suffer from over-fitting in the way that measures
    like the sum-of-squares or (log-)likelihood do.


  • "Basis", "IndependentVariables": Simply the basis functions and independent variable specified by the user.



  • "Posterior", "Prior": These two keys each hold an association with 4 distributions:



    1. "PredictiveDistribution": A distribution that depends on the
      independent variables (x in the example above). By filling in a value
      for x, you get a distribution that tells you where you could expect
      to find future y values. This distribution accounts for all relevant
      uncertainties in the model: model variance caused by the term
      eps; uncertainty in the values of a and b; and uncertainty in
      sigma.


    2. "UnderlyingValueDistribution": Similar to "PredictiveDistribution", but this distribution give the possible values of a + b x without the eps error term.


    3. "RegressionCoefficientDistribution": The join distribution over a and b.


    4. "ErrorDistribution": The distribution of the variance sigma^2.



  • "PriorParameters", "PosteriorParameters": These parameters are not immediately important most of the time, but they contain all of the relevant information about the fit.


People familiar with Bayesian analysis may note that one distribution is absent: the full joint distribution over a, b and sigma all together (I only gave the marginals over a and b on one hand and sigma on the other). This is because Mathematica doesn't really offer a convenient framework for representing this distribution, unfortunately.



Sources of formulas used:



  • https://en.wikipedia.org/wiki/Bayesian_linear_regression

  • https://en.wikipedia.org/wiki/Bayesian_multivariate_linear_regression





share|improve this answer












$endgroup$



I submitted this question to answer it myself, since I recently updated my Bayesian inference repository on GitHub with a function called BayesianLinearRegression that does just this. I wrote a general introduction to its functionalities on the Wolfram Community and the example notebook on GitHub shows some more advanced uses of the function. I also submitted the function to the Wolfram function repository and you can use this version by replacing BayesianLinearRegression with ResourceFunction["BayesianLinearRegression"] in the examples below (or just evaluating BayesianLinearRegression = ResourceFunction["BayesianLinearRegression"] once).



Please refer to the README.md file (shown on the front page of the repository link) for instructions on the installation of the BayesianInference package. If you don't want the whole package, you can also get the code for BayesianLinearRegression directly from the relevant package file.



Example of use



BayesianLinearRegression uses the same syntax as LinearModelFit. In addition, it also supports Rule-based definitions of input-output data as used by Predict (i.e., data of the form x1 -> y1, ... or x1, x2, ... -> y1, y2, .... This format is particularly useful for multivariate regression (i.e., when the y values are vectors), which is also supported by BayesianLinearRegression.



The output of the function is an Association with all relevant information about the fit.



data = 
-1.5`,-1.375`,-1.34375`,-2.375`,1.5`,0.21875`, 1.03125`,0.6875`,-0.5`,-0.59375`, -1.875`,-2.59375`,1.625`,1.1875`,
-2.0625`,-1.875`,1.0625`,0.5`,-0.4375`,-0.28125`,-0.75`,-0.75`,2.125`,0.375`,0.4375`,0.6875`,-1.3125`,-0.75`,-1.125`,-0.21875`,
0.625`,0.40625`,-0.25`,0.59375`,-1.875`,-1.625`,-1.`,-0.8125`,0.4375`,-0.09375`
;
Clear[x];
model = BayesianLinearRegression[data, 1, x, x];
Keys[model]



Out[21]= "LogEvidence", "PriorParameters", "PosteriorParameters", "Posterior", "Prior", "Basis", "IndependentVariables"




The posterior predictive distribution is specified as an x-dependent probability distribution:



model["Posterior", "PredictiveDistribution"]



Out[15]= StudentTDistribution[-0.248878 + 0.714688 x, 0.555877 Sqrt[1.05211 + 0.0164952 x + 0.031814 x^2], 2001/100]




Show the single prediction bands:



With[
predictiveDist = model["Posterior", "PredictiveDistribution"],
bands = 95, 50, 5
,
Show[
Plot[
Evaluate@InverseCDF[predictiveDist, bands/100],
x, -4, 4, Filling -> 1 -> 2, 3 -> 2, PlotLegends -> bands
],
ListPlot[data],
PlotRange -> All
]
]


Prediction bands of 1st degree fit



It looks like the data could also be fitted with a quadratic fit. To test this, compute the log-evidence for polynomial models up to degree 4 and rank them:



In[152]:= models = AssociationMap[
BayesianLinearRegression[Rule @@@ data, x^Range[0, #], x] &,
Range[0, 4]
];
ReverseSort @ models[[All, "LogEvidence"]]



Out[153]= <|1 -> -30.0072, 2 -> -30.1774, 3 -> -34.4292, 4 -> -38.7037, 0 -> -38.787|>




The evidences for the first and second degree models are almost equal. In this case, it may be more appropriate to define a mixture of the models with weights derived from their evidences:



 weights = Normalize[
(* subtract the max to reduce rounding error *)
Exp[models[[All, "LogEvidence"]] - Max[models[[All, "LogEvidence"]]]],
Total
];
mixDist = MixtureDistribution[
Values[weights],
Values @ models[[All, "Posterior", "PredictiveDistribution"]]
];
Show[
(*regressionPlot1D is a utility function from the package*)
regressionPlot1D[mixDist, x, -3, 3],
ListPlot[data]
]


Prediction bands of a mixture model of polynomials up to degree 4



As you can see, the mixture model shows features of both the first and second degree fits to the data.



Please refer to the example notebook for information about specification of priors (see the section "Options of BayesianLinearRegression" -> "PriorParameters") and multivariate regression.



Detailed explanation about the returned values



For purposes of illustration, consider a simple model like:



y == a + b x + eps


with eps distributed as NormalDistribution[0, sigma]. This model is fitted with BayesianLinearRegression[data, 1, x, x]. Here is an explanation of the keys in the returned Association:



  • "LogEvidence": In a Bayesian setting, the evidence (also called
    marginal likelihood) measures how well the model fits the data (with
    a higher evidence indicating a better fit). The evidence has the
    virtue that it naturally penalizes models for their complexity and
    therefore does not suffer from over-fitting in the way that measures
    like the sum-of-squares or (log-)likelihood do.


  • "Basis", "IndependentVariables": Simply the basis functions and independent variable specified by the user.



  • "Posterior", "Prior": These two keys each hold an association with 4 distributions:



    1. "PredictiveDistribution": A distribution that depends on the
      independent variables (x in the example above). By filling in a value
      for x, you get a distribution that tells you where you could expect
      to find future y values. This distribution accounts for all relevant
      uncertainties in the model: model variance caused by the term
      eps; uncertainty in the values of a and b; and uncertainty in
      sigma.


    2. "UnderlyingValueDistribution": Similar to "PredictiveDistribution", but this distribution give the possible values of a + b x without the eps error term.


    3. "RegressionCoefficientDistribution": The join distribution over a and b.


    4. "ErrorDistribution": The distribution of the variance sigma^2.



  • "PriorParameters", "PosteriorParameters": These parameters are not immediately important most of the time, but they contain all of the relevant information about the fit.


People familiar with Bayesian analysis may note that one distribution is absent: the full joint distribution over a, b and sigma all together (I only gave the marginals over a and b on one hand and sigma on the other). This is because Mathematica doesn't really offer a convenient framework for representing this distribution, unfortunately.



Sources of formulas used:



  • https://en.wikipedia.org/wiki/Bayesian_linear_regression

  • https://en.wikipedia.org/wiki/Bayesian_multivariate_linear_regression






share|improve this answer















share|improve this answer




share|improve this answer








edited Oct 2 at 14:21

























answered Jul 14 at 14:34









Sjoerd SmitSjoerd Smit

7,26016 silver badges28 bronze badges




7,26016 silver badges28 bronze badges














  • $begingroup$
    Awesome! Is there a way to set the prior? What prior did you use by default?
    $endgroup$
    – Roman
    Jul 14 at 14:36











  • $begingroup$
    @Roman Yes you can: "Please refer to the example notebook for information about specification of priors" ;) (I updated that sentence with the section where you can find it)
    $endgroup$
    – Sjoerd Smit
    Jul 14 at 14:37











  • $begingroup$
    Nice. I believe version 12 provides penalty in all fitting routines of mathematica? This is equivalent to maximum a posteriori where exp(penalty) represents the prior?
    $endgroup$
    – chris
    Jul 14 at 15:16






  • 3




    $begingroup$
    @chris Functions like LinearModelFit compute the Akaike Information Criterion and Bayesian Information Criterion. The serve a similar purpose, but are quite different from the Bayesian evidence. Furthermore, BayesianLinearRegression does not make point estimates (such as MAP) but retains the full distribution over the fit coefficients. Don't throw away information if you don't have to ;).
    $endgroup$
    – Sjoerd Smit
    Jul 14 at 15:23






  • 2




    $begingroup$
    Thank you for doing this work and sharing it with the community. I do find it frustrating that Wolfram has not made greater efforts to facilitate the use the Bayesian paradigm and statistical models for those users, such as myself, who are not capable of building/programming these models/implementations themselves; but nonetheless are able and desirous of following along with the competent and professional implementations of others.
    $endgroup$
    – BeanSith
    Jul 17 at 3:12
















  • $begingroup$
    Awesome! Is there a way to set the prior? What prior did you use by default?
    $endgroup$
    – Roman
    Jul 14 at 14:36











  • $begingroup$
    @Roman Yes you can: "Please refer to the example notebook for information about specification of priors" ;) (I updated that sentence with the section where you can find it)
    $endgroup$
    – Sjoerd Smit
    Jul 14 at 14:37











  • $begingroup$
    Nice. I believe version 12 provides penalty in all fitting routines of mathematica? This is equivalent to maximum a posteriori where exp(penalty) represents the prior?
    $endgroup$
    – chris
    Jul 14 at 15:16






  • 3




    $begingroup$
    @chris Functions like LinearModelFit compute the Akaike Information Criterion and Bayesian Information Criterion. The serve a similar purpose, but are quite different from the Bayesian evidence. Furthermore, BayesianLinearRegression does not make point estimates (such as MAP) but retains the full distribution over the fit coefficients. Don't throw away information if you don't have to ;).
    $endgroup$
    – Sjoerd Smit
    Jul 14 at 15:23






  • 2




    $begingroup$
    Thank you for doing this work and sharing it with the community. I do find it frustrating that Wolfram has not made greater efforts to facilitate the use the Bayesian paradigm and statistical models for those users, such as myself, who are not capable of building/programming these models/implementations themselves; but nonetheless are able and desirous of following along with the competent and professional implementations of others.
    $endgroup$
    – BeanSith
    Jul 17 at 3:12















$begingroup$
Awesome! Is there a way to set the prior? What prior did you use by default?
$endgroup$
– Roman
Jul 14 at 14:36





$begingroup$
Awesome! Is there a way to set the prior? What prior did you use by default?
$endgroup$
– Roman
Jul 14 at 14:36













$begingroup$
@Roman Yes you can: "Please refer to the example notebook for information about specification of priors" ;) (I updated that sentence with the section where you can find it)
$endgroup$
– Sjoerd Smit
Jul 14 at 14:37





$begingroup$
@Roman Yes you can: "Please refer to the example notebook for information about specification of priors" ;) (I updated that sentence with the section where you can find it)
$endgroup$
– Sjoerd Smit
Jul 14 at 14:37













$begingroup$
Nice. I believe version 12 provides penalty in all fitting routines of mathematica? This is equivalent to maximum a posteriori where exp(penalty) represents the prior?
$endgroup$
– chris
Jul 14 at 15:16




$begingroup$
Nice. I believe version 12 provides penalty in all fitting routines of mathematica? This is equivalent to maximum a posteriori where exp(penalty) represents the prior?
$endgroup$
– chris
Jul 14 at 15:16




3




3




$begingroup$
@chris Functions like LinearModelFit compute the Akaike Information Criterion and Bayesian Information Criterion. The serve a similar purpose, but are quite different from the Bayesian evidence. Furthermore, BayesianLinearRegression does not make point estimates (such as MAP) but retains the full distribution over the fit coefficients. Don't throw away information if you don't have to ;).
$endgroup$
– Sjoerd Smit
Jul 14 at 15:23




$begingroup$
@chris Functions like LinearModelFit compute the Akaike Information Criterion and Bayesian Information Criterion. The serve a similar purpose, but are quite different from the Bayesian evidence. Furthermore, BayesianLinearRegression does not make point estimates (such as MAP) but retains the full distribution over the fit coefficients. Don't throw away information if you don't have to ;).
$endgroup$
– Sjoerd Smit
Jul 14 at 15:23




2




2




$begingroup$
Thank you for doing this work and sharing it with the community. I do find it frustrating that Wolfram has not made greater efforts to facilitate the use the Bayesian paradigm and statistical models for those users, such as myself, who are not capable of building/programming these models/implementations themselves; but nonetheless are able and desirous of following along with the competent and professional implementations of others.
$endgroup$
– BeanSith
Jul 17 at 3:12




$begingroup$
Thank you for doing this work and sharing it with the community. I do find it frustrating that Wolfram has not made greater efforts to facilitate the use the Bayesian paradigm and statistical models for those users, such as myself, who are not capable of building/programming these models/implementations themselves; but nonetheless are able and desirous of following along with the competent and professional implementations of others.
$endgroup$
– BeanSith
Jul 17 at 3:12


















draft saved

draft discarded















































Thanks for contributing an answer to Mathematica Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f202065%2fhow-to-fit-a-linear-model-in-the-bayesian-way-in-mathematica%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown









Popular posts from this blog

Tamil (spriik) Luke uk diar | Nawigatjuun

Align equal signs while including text over equalitiesAMS align: left aligned text/math plus multicolumn alignmentMultiple alignmentsAligning equations in multiple placesNumbering and aligning an equation with multiple columnsHow to align one equation with another multline equationUsing \ in environments inside the begintabularxNumber equations and preserving alignment of equal signsHow can I align equations to the left and to the right?Double equation alignment problem within align enviromentAligned within align: Why are they right-aligned?

Training a classifier when some of the features are unknownWhy does Gradient Boosting regression predict negative values when there are no negative y-values in my training set?How to improve an existing (trained) classifier?What is effect when I set up some self defined predisctor variables?Why Matlab neural network classification returns decimal values on prediction dataset?Fitting and transforming text data in training, testing, and validation setsHow to quantify the performance of the classifier (multi-class SVM) using the test data?How do I control for some patients providing multiple samples in my training data?Training and Test setTraining a convolutional neural network for image denoising in MatlabShouldn't an autoencoder with #(neurons in hidden layer) = #(neurons in input layer) be “perfect”?