How to Compute the Brier Score for more than Two ClassesEvaluating Unbalanced Multiclass Classifiers: Which Tests to Use?Transform multiclass classification to binary - benefits?Multi-class classification via all pairwise classifications with LDAWhy the Brier Score's better when probabilities are estimated through PAVA instead of Platt Scaling?How to get accuracy, confusion matrix of binary SVM classifier equivalent to multiclass classification?Why would a binary decision tree classifier only work for balanced data?

What is the fastest way to move in Borderlands 3?

Can I color text by using an image, so that the color isn't flat?

Is sleeping on the groud in cold weather better than on an air mattress?

What are the bars protruding from this C-130?

Is there such thing as plasma (from reentry) creating lift?

Why do previous versions of Debian packages vanish in the package repositories? (highly relevant for version-controlled system configuration)

What Supreme Court cases, other than Nixon v. United States, have directly applied or interpreted U.S. Const. Art. I, Section 3, Clause 6?

Can I remake a game I don't own any copyright to?

How to make "acts of patience" exciting?

This fell out of my toilet when I unscrewed the supply line. What is it?

Why didn't Snape ask Dumbledore why he let "Moody" search his office?

Son of the Revenge of the Riley Riddles in Reverse Strikes Again

Would it be easier to colonise a living world or a dead world?

What kind of mission objective would make a parabolic escape trajectory desirable?

Why did a young George Washington sign a document admitting to assassinating a French military officer?

one-liner vs script

How do I break the broom in Untitled Goose Game?

Does the Creighton Method of Natural Family Planning have a failure rate of 3.2% or less?

How stable are PID loops really?

Low-magic medieval fantasy clothes that allow the wearer to grow?

How can I remove rest of file from string for all files?

What is /dev/null and why can't I use hx on it?

How to make a gift without seeming creepy?

A sentient carnivorous species trying to preserve life. How could they find a new food source?

How to Compute the Brier Score for more than Two Classes

Evaluating Unbalanced Multiclass Classifiers: Which Tests to Use?Transform multiclass classification to binary - benefits?Multi-class classification via all pairwise classifications with LDAWhy the Brier Score's better when probabilities are estimated through PAVA instead of Platt Scaling?How to get accuracy, confusion matrix of binary SVM classifier equivalent to multiclass classification?Why would a binary decision tree classifier only work for balanced data?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;

tl;dr

How do I correctly compute the Brier score for more than two classes? I got confusing results with different approaches. Details below.

As suggested to me in a comment to this question, I would like to evaluate the quality of a set of classifiers I trained with the Brier score. These classifiers are multiclass classifiers and the classes are imbalanced. The Brier score should be able to handle these conditions. However, I am not quite confident about how to apply the Brier score test. Say I have 10 data points and 5 classes:

One hot vectors represent which class is present in a given item of data:

targets = array([[0, 0, 0, 0, 1],
 [0, 0, 0, 0, 1],
 [0, 0, 0, 0, 1],
 [0, 1, 0, 0, 0],
 [0, 0, 0, 0, 1],
 [0, 0, 1, 0, 0],
 [1, 0, 0, 0, 0],
 [0, 1, 0, 0, 0],
 [1, 0, 0, 0, 0],
 [1, 0, 0, 0, 0]])

Vectors of probabilities represent the outputs of my classifiers, assigning a probability to each class

probs = array([[0.14, 0.38, 0.4 , 0.04, 0.05],
 [0.55, 0.05, 0.34, 0.04, 0.01],
 [0.3 , 0.35, 0.18, 0.09, 0.08],
 [0.23, 0.22, 0.04, 0.05, 0.46],
 [0. , 0.15, 0.47, 0.28, 0.09],
 [0.23, 0.13, 0.34, 0.27, 0.03],
 [0.32, 0.06, 0.59, 0.02, 0.01],
 [0.01, 0.19, 0.01, 0.03, 0.75],
 [0.27, 0.38, 0.03, 0.12, 0.2 ],
 [0.17, 0.45, 0.11, 0.25, 0.01]])

These matrices are coindexed, so probs[i, j] is the probability of class targets[i, j].

Now, according to Wikipedia the definition of the Brier Score for multiple classes is

$$frac1N sum_t=1^N sum_i=1^R (f_ti - o_ti)^2$$

When I program this in Python and run it on the above targets and probs matrices, I get a result of $1.0069$

>>> def brier_multi(targets, probs):
... return np.mean(np.sum((probs - targets)**2, axis=1))
... 
>>> brier_multi(targets, probs)
1.0068899999999998

But I am not sure if I interpreted the definition correctly.

For Python the sklearn library provides sklearn.metrics.brier_score_loss. While the documentation states

The Brier score is appropriate for binary and categorical outcomes that can be structured as true or false

What the function actually does is pick one (or get one passed as an argument) of $n > 2$ classes and treat that class as class $1$ and all other classes as class $0$.

For example, if we choose class 3 (index 2) as the $1$ class and thus all other classes as class $0$, we get:

>>> # get true classes by argmax over binary arrays
... true_classes = np.argmax(targets, axis=1)
>>> 
>>> brier_score_loss(true_classes, probs[:,2], pos_label=2)
0.13272999999999996

alternatively:

>>> brier_score_loss(targets[:,2], probs[:,2])
0.13272999999999996

This is indeed the binary version of the Brier score, as can be shown by manually defining and running it:

>>> def brier_bin_(targets, probs):
... return np.mean((targets - probs) ** 2)
>>> brier_bin(targets[:,2], probs[:,2])
0.13272999999999996

As you can see, this is the same result as with sklearn's brier_score_loss.

Wikipedia states about the binary version:

This formulation is mostly used for binary events (for example "rain"
or "no rain"). The above equation is a proper scoring rule only for
binary events;

So... Now I am confused and have the following questions:

1) If sklearn computes the multi class Brier score as a One vs. All binary score, is that the only and correct way to compute the multi class Brier score?

Which leads me to

2) If that is so, my brier_multi code must be based on a misconception. What is my misconception about the definition of the multiclass Brier score?

3) Maybe I am on the wrong track altogether. In which case, please explain to me, how I compute the Brier score correctly?

edited Apr 17 at 11:11

asked Apr 17 at 10:42

lo tolmencre

598 bronze badges

add a comment
|

tl;dr

How do I correctly compute the Brier score for more than two classes? I got confusing results with different approaches. Details below.

One hot vectors represent which class is present in a given item of data:

targets = array([[0, 0, 0, 0, 1],
 [0, 0, 0, 0, 1],
 [0, 0, 0, 0, 1],
 [0, 1, 0, 0, 0],
 [0, 0, 0, 0, 1],
 [0, 0, 1, 0, 0],
 [1, 0, 0, 0, 0],
 [0, 1, 0, 0, 0],
 [1, 0, 0, 0, 0],
 [1, 0, 0, 0, 0]])

Vectors of probabilities represent the outputs of my classifiers, assigning a probability to each class

probs = array([[0.14, 0.38, 0.4 , 0.04, 0.05],
 [0.55, 0.05, 0.34, 0.04, 0.01],
 [0.3 , 0.35, 0.18, 0.09, 0.08],
 [0.23, 0.22, 0.04, 0.05, 0.46],
 [0. , 0.15, 0.47, 0.28, 0.09],
 [0.23, 0.13, 0.34, 0.27, 0.03],
 [0.32, 0.06, 0.59, 0.02, 0.01],
 [0.01, 0.19, 0.01, 0.03, 0.75],
 [0.27, 0.38, 0.03, 0.12, 0.2 ],
 [0.17, 0.45, 0.11, 0.25, 0.01]])

These matrices are coindexed, so probs[i, j] is the probability of class targets[i, j].

Now, according to Wikipedia the definition of the Brier Score for multiple classes is

$$frac1N sum_t=1^N sum_i=1^R (f_ti - o_ti)^2$$

When I program this in Python and run it on the above targets and probs matrices, I get a result of $1.0069$

>>> def brier_multi(targets, probs):
... return np.mean(np.sum((probs - targets)**2, axis=1))
... 
>>> brier_multi(targets, probs)
1.0068899999999998

But I am not sure if I interpreted the definition correctly.

For Python the sklearn library provides sklearn.metrics.brier_score_loss. While the documentation states

The Brier score is appropriate for binary and categorical outcomes that can be structured as true or false

What the function actually does is pick one (or get one passed as an argument) of $n > 2$ classes and treat that class as class $1$ and all other classes as class $0$.

For example, if we choose class 3 (index 2) as the $1$ class and thus all other classes as class $0$, we get:

>>> # get true classes by argmax over binary arrays
... true_classes = np.argmax(targets, axis=1)
>>> 
>>> brier_score_loss(true_classes, probs[:,2], pos_label=2)
0.13272999999999996

alternatively:

>>> brier_score_loss(targets[:,2], probs[:,2])
0.13272999999999996

This is indeed the binary version of the Brier score, as can be shown by manually defining and running it:

>>> def brier_bin_(targets, probs):
... return np.mean((targets - probs) ** 2)
>>> brier_bin(targets[:,2], probs[:,2])
0.13272999999999996

As you can see, this is the same result as with sklearn's brier_score_loss.

Wikipedia states about the binary version:

This formulation is mostly used for binary events (for example "rain"
or "no rain"). The above equation is a proper scoring rule only for
binary events;

So... Now I am confused and have the following questions:

1) If sklearn computes the multi class Brier score as a One vs. All binary score, is that the only and correct way to compute the multi class Brier score?

Which leads me to

2) If that is so, my brier_multi code must be based on a misconception. What is my misconception about the definition of the multiclass Brier score?

3) Maybe I am on the wrong track altogether. In which case, please explain to me, how I compute the Brier score correctly?

edited Apr 17 at 11:11

asked Apr 17 at 10:42

lo tolmencre

598 bronze badges

add a comment
|

tl;dr

How do I correctly compute the Brier score for more than two classes? I got confusing results with different approaches. Details below.

One hot vectors represent which class is present in a given item of data:

targets = array([[0, 0, 0, 0, 1],
 [0, 0, 0, 0, 1],
 [0, 0, 0, 0, 1],
 [0, 1, 0, 0, 0],
 [0, 0, 0, 0, 1],
 [0, 0, 1, 0, 0],
 [1, 0, 0, 0, 0],
 [0, 1, 0, 0, 0],
 [1, 0, 0, 0, 0],
 [1, 0, 0, 0, 0]])

Vectors of probabilities represent the outputs of my classifiers, assigning a probability to each class

probs = array([[0.14, 0.38, 0.4 , 0.04, 0.05],
 [0.55, 0.05, 0.34, 0.04, 0.01],
 [0.3 , 0.35, 0.18, 0.09, 0.08],
 [0.23, 0.22, 0.04, 0.05, 0.46],
 [0. , 0.15, 0.47, 0.28, 0.09],
 [0.23, 0.13, 0.34, 0.27, 0.03],
 [0.32, 0.06, 0.59, 0.02, 0.01],
 [0.01, 0.19, 0.01, 0.03, 0.75],
 [0.27, 0.38, 0.03, 0.12, 0.2 ],
 [0.17, 0.45, 0.11, 0.25, 0.01]])

These matrices are coindexed, so probs[i, j] is the probability of class targets[i, j].

Now, according to Wikipedia the definition of the Brier Score for multiple classes is

$$frac1N sum_t=1^N sum_i=1^R (f_ti - o_ti)^2$$

When I program this in Python and run it on the above targets and probs matrices, I get a result of $1.0069$

>>> def brier_multi(targets, probs):
... return np.mean(np.sum((probs - targets)**2, axis=1))
... 
>>> brier_multi(targets, probs)
1.0068899999999998

But I am not sure if I interpreted the definition correctly.

For Python the sklearn library provides sklearn.metrics.brier_score_loss. While the documentation states

The Brier score is appropriate for binary and categorical outcomes that can be structured as true or false

What the function actually does is pick one (or get one passed as an argument) of $n > 2$ classes and treat that class as class $1$ and all other classes as class $0$.

For example, if we choose class 3 (index 2) as the $1$ class and thus all other classes as class $0$, we get:

>>> # get true classes by argmax over binary arrays
... true_classes = np.argmax(targets, axis=1)
>>> 
>>> brier_score_loss(true_classes, probs[:,2], pos_label=2)
0.13272999999999996

alternatively:

>>> brier_score_loss(targets[:,2], probs[:,2])
0.13272999999999996

This is indeed the binary version of the Brier score, as can be shown by manually defining and running it:

>>> def brier_bin_(targets, probs):
... return np.mean((targets - probs) ** 2)
>>> brier_bin(targets[:,2], probs[:,2])
0.13272999999999996

As you can see, this is the same result as with sklearn's brier_score_loss.

Wikipedia states about the binary version:

This formulation is mostly used for binary events (for example "rain"
or "no rain"). The above equation is a proper scoring rule only for
binary events;

So... Now I am confused and have the following questions:

1) If sklearn computes the multi class Brier score as a One vs. All binary score, is that the only and correct way to compute the multi class Brier score?

Which leads me to

2) If that is so, my brier_multi code must be based on a misconception. What is my misconception about the definition of the multiclass Brier score?

3) Maybe I am on the wrong track altogether. In which case, please explain to me, how I compute the Brier score correctly?

edited Apr 17 at 11:11

asked Apr 17 at 10:42

lo tolmencre

598 bronze badges

tl;dr

How do I correctly compute the Brier score for more than two classes? I got confusing results with different approaches. Details below.

One hot vectors represent which class is present in a given item of data:

targets = array([[0, 0, 0, 0, 1],
 [0, 0, 0, 0, 1],
 [0, 0, 0, 0, 1],
 [0, 1, 0, 0, 0],
 [0, 0, 0, 0, 1],
 [0, 0, 1, 0, 0],
 [1, 0, 0, 0, 0],
 [0, 1, 0, 0, 0],
 [1, 0, 0, 0, 0],
 [1, 0, 0, 0, 0]])

Vectors of probabilities represent the outputs of my classifiers, assigning a probability to each class

probs = array([[0.14, 0.38, 0.4 , 0.04, 0.05],
 [0.55, 0.05, 0.34, 0.04, 0.01],
 [0.3 , 0.35, 0.18, 0.09, 0.08],
 [0.23, 0.22, 0.04, 0.05, 0.46],
 [0. , 0.15, 0.47, 0.28, 0.09],
 [0.23, 0.13, 0.34, 0.27, 0.03],
 [0.32, 0.06, 0.59, 0.02, 0.01],
 [0.01, 0.19, 0.01, 0.03, 0.75],
 [0.27, 0.38, 0.03, 0.12, 0.2 ],
 [0.17, 0.45, 0.11, 0.25, 0.01]])

These matrices are coindexed, so probs[i, j] is the probability of class targets[i, j].

Now, according to Wikipedia the definition of the Brier Score for multiple classes is

$$frac1N sum_t=1^N sum_i=1^R (f_ti - o_ti)^2$$

When I program this in Python and run it on the above targets and probs matrices, I get a result of $1.0069$

>>> def brier_multi(targets, probs):
... return np.mean(np.sum((probs - targets)**2, axis=1))
... 
>>> brier_multi(targets, probs)
1.0068899999999998

But I am not sure if I interpreted the definition correctly.

For Python the sklearn library provides sklearn.metrics.brier_score_loss. While the documentation states

The Brier score is appropriate for binary and categorical outcomes that can be structured as true or false

What the function actually does is pick one (or get one passed as an argument) of $n > 2$ classes and treat that class as class $1$ and all other classes as class $0$.

For example, if we choose class 3 (index 2) as the $1$ class and thus all other classes as class $0$, we get:

>>> # get true classes by argmax over binary arrays
... true_classes = np.argmax(targets, axis=1)
>>> 
>>> brier_score_loss(true_classes, probs[:,2], pos_label=2)
0.13272999999999996

alternatively:

>>> brier_score_loss(targets[:,2], probs[:,2])
0.13272999999999996

This is indeed the binary version of the Brier score, as can be shown by manually defining and running it:

>>> def brier_bin_(targets, probs):
... return np.mean((targets - probs) ** 2)
>>> brier_bin(targets[:,2], probs[:,2])
0.13272999999999996

As you can see, this is the same result as with sklearn's brier_score_loss.

Wikipedia states about the binary version:

This formulation is mostly used for binary events (for example "rain"
or "no rain"). The above equation is a proper scoring rule only for
binary events;

So... Now I am confused and have the following questions:

1) If sklearn computes the multi class Brier score as a One vs. All binary score, is that the only and correct way to compute the multi class Brier score?

Which leads me to

2) If that is so, my brier_multi code must be based on a misconception. What is my misconception about the definition of the multiclass Brier score?

3) Maybe I am on the wrong track altogether. In which case, please explain to me, how I compute the Brier score correctly?

classification scikit-learn model-evaluation scoring-rules

edited Apr 17 at 11:11

asked Apr 17 at 10:42

lo tolmencre

598 bronze badges

edited Apr 17 at 11:11

asked Apr 17 at 10:42

lo tolmencre

598 bronze badges

edited Apr 17 at 11:11

asked Apr 17 at 10:42

lo tolmencre

598 bronze badges

asked Apr 17 at 10:42

lo tolmencre

598 bronze badges

asked Apr 17 at 10:42

lo tolmencre

598 bronze badges

add a comment
|

1 Answer
1

active

oldest

votes

Wikipedia's version of the Brier score for multiple categories is correct. Compare the original publication by Brier (1950), or any number of academic publications, e.g. Czado et al. (2009) (equation (6), though you would need to do some simple arithmetic and drop a constant 1 to arrive at Brier's formulation).

If sklearn calculates a binary "one against all" Brier score and averages over all choices of a focal class, then it can certainly do so. However, it is simply not the Brier score. Passing it off as such is misleading and wrong.

The misconception lies entirely with sklearn.

Just use your brier_multi, it's completely correct.

answered Apr 17 at 11:04

Stephan Kolassa

57.4k10 gold badges113 silver badges211 bronze badges

add a comment
|

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f403544%2fhow-to-compute-the-brier-score-for-more-than-two-classes%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

If sklearn calculates a binary "one against all" Brier score and averages over all choices of a focal class, then it can certainly do so. However, it is simply not the Brier score. Passing it off as such is misleading and wrong.

The misconception lies entirely with sklearn.

Just use your brier_multi, it's completely correct.

answered Apr 17 at 11:04

Stephan Kolassa

57.4k10 gold badges113 silver badges211 bronze badges

add a comment
|

If sklearn calculates a binary "one against all" Brier score and averages over all choices of a focal class, then it can certainly do so. However, it is simply not the Brier score. Passing it off as such is misleading and wrong.

The misconception lies entirely with sklearn.

Just use your brier_multi, it's completely correct.

answered Apr 17 at 11:04

Stephan Kolassa

57.4k10 gold badges113 silver badges211 bronze badges

add a comment
|

If sklearn calculates a binary "one against all" Brier score and averages over all choices of a focal class, then it can certainly do so. However, it is simply not the Brier score. Passing it off as such is misleading and wrong.

The misconception lies entirely with sklearn.

Just use your brier_multi, it's completely correct.

answered Apr 17 at 11:04

Stephan Kolassa

57.4k10 gold badges113 silver badges211 bronze badges

If sklearn calculates a binary "one against all" Brier score and averages over all choices of a focal class, then it can certainly do so. However, it is simply not the Brier score. Passing it off as such is misleading and wrong.

The misconception lies entirely with sklearn.

Just use your brier_multi, it's completely correct.

answered Apr 17 at 11:04

Stephan Kolassa

57.4k10 gold badges113 silver badges211 bronze badges

answered Apr 17 at 11:04

Stephan Kolassa

57.4k10 gold badges113 silver badges211 bronze badges

answered Apr 17 at 11:04

Stephan Kolassa

57.4k10 gold badges113 silver badges211 bronze badges

answered Apr 17 at 11:04

Stephan Kolassa

57.4k10 gold badges113 silver badges211 bronze badges

add a comment
|

draft saved

draft discarded

Thanks for contributing an answer to Cross Validated!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

sEUG,S7tqv0o7s0mdDqXrZa47MRtRXN XgtlnrGRhD3q2b,69BR8WEq,FkQQXZUYUqInlmD3ihYZ3V2HOhgowYiU

搜尋此網誌

Bsrgvty

tl;dr

tl;dr

tl;dr

tl;dr

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Tamil (spriik) Luke uk diar | Nawigatjuun

tl;dr

tl;dr

tl;dr

tl;dr

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tamil (spriik) Luke uk diar | Nawigatjuun

1 Answer
1

1 Answer
1

1 Answer
1