What loss function to use when labels are probabilities?Understanding GAN loss functionHelp with implementing Q-learning for a feedfoward network playing a video gameHow do I implement softmax forward propagation and backpropagation to replace sigmoid in a neural network?Should the input to the negative log likelihood loss function be probabilities?Gradient of hinge loss functionHow to understand marginal loglikelihood objective function as loss function (explanation of an article)?Loss function spikesWhat is the motivation for row-wise convolution and folding in Kalchbrenner et al. (2014)?Should I use the hyperbolic distance loss in the case of Poincarè Disk Model?Add a layer derivative in the loss function

Why do they not say "The Baby"

Why does the autopilot disengage even when it does not receive pilot input?

Cubic programming and beyond?

Ambiguous sentences: How to tell when they need fixing?

Players of unusual orchestral instruments

does ability to impeach an expert witness on science or scholarship go too far?

Military Weapon System

What is the German equivalent of 干物女 (dried fish woman)?

Draw 3D Cubes around centre

School House Points (Python + SQLite)

Bob's unnecessary trip to the shops

When did the Roman Empire fall according to contemporaries?

Why doesn't the Lars family (and thus Luke) speak Huttese as their first language?

Should you avoid redundant information after dialogue?

TikZ Can I draw an arrow by specifying the initial point, direction, and length?

What is this welding tool I found in my attic?

What is the commentary on Leviticus 21:2-4 - why is wife not included on the list

how can draw a kiviat diagram?

Won 50K! Now what should I do with it

Is killing off one of my queer characters homophobic?

How to repair a laptop's screen hinges?

Filtering fine silt/mud from water (not necessarily bacteria etc.)

Is this floating-point optimization allowed?

Did any of the founding fathers anticipate Lysander Spooner's criticism of the constitution?

What loss function to use when labels are probabilities?

Understanding GAN loss functionHelp with implementing Q-learning for a feedfoward network playing a video gameHow do I implement softmax forward propagation and backpropagation to replace sigmoid in a neural network?Should the input to the negative log likelihood loss function be probabilities?Gradient of hinge loss functionHow to understand marginal loglikelihood objective function as loss function (explanation of an article)?Loss function spikesWhat is the motivation for row-wise convolution and folding in Kalchbrenner et al. (2014)?Should I use the hyperbolic distance loss in the case of Poincarè Disk Model?Add a layer derivative in the loss function

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

What loss function is most appropriate when training a model with target values that are probabilities? For example, I have a 3-output model. I want to train it with a feature vector $x=[x_1, x_2, dots, x_N]$ and a target $y=[0.2, 0.3, 0.5]$.

It seems like something like cross-entropy doesn't make sense here since it assumes that a single target is the correct label.

Would something like MSE (after applying softmax) make sense, or is there a better loss function?

edited Apr 15 at 10:11

nbro

5,3164 gold badges15 silver badges30 bronze badges

asked Apr 14 at 22:13

Thomas Johnson

1233 bronze badges

add a comment |

It seems like something like cross-entropy doesn't make sense here since it assumes that a single target is the correct label.

Would something like MSE (after applying softmax) make sense, or is there a better loss function?

edited Apr 15 at 10:11

nbro

5,3164 gold badges15 silver badges30 bronze badges

asked Apr 14 at 22:13

Thomas Johnson

1233 bronze badges

add a comment |

It seems like something like cross-entropy doesn't make sense here since it assumes that a single target is the correct label.

Would something like MSE (after applying softmax) make sense, or is there a better loss function?

edited Apr 15 at 10:11

nbro

5,3164 gold badges15 silver badges30 bronze badges

asked Apr 14 at 22:13

Thomas Johnson

1233 bronze badges

It seems like something like cross-entropy doesn't make sense here since it assumes that a single target is the correct label.

Would something like MSE (after applying softmax) make sense, or is there a better loss function?

neural-networks machine-learning loss-functions probability-distribution

edited Apr 15 at 10:11

nbro

5,3164 gold badges15 silver badges30 bronze badges

asked Apr 14 at 22:13

Thomas Johnson

1233 bronze badges

edited Apr 15 at 10:11

nbro

5,3164 gold badges15 silver badges30 bronze badges

asked Apr 14 at 22:13

Thomas Johnson

1233 bronze badges

edited Apr 15 at 10:11

nbro

5,3164 gold badges15 silver badges30 bronze badges

edited Apr 15 at 10:11

nbro

5,3164 gold badges15 silver badges30 bronze badges

edited Apr 15 at 10:11

nbro

5,3164 gold badges15 silver badges30 bronze badges

asked Apr 14 at 22:13

Thomas Johnson

1233 bronze badges

asked Apr 14 at 22:13

Thomas Johnson

1233 bronze badges

asked Apr 14 at 22:13

Thomas Johnson

1233 bronze badges

add a comment |

1 Answer
1

active

oldest

votes

Actually, the cross-entropy loss function would be appropriate here, since it measures the "distance" between a distribution $q$ and the "true" distribution $p$.

You are right, though, that using a loss function called "cross_entropy" in many APIs would be a mistake. This is because these functions, as you said, assume a one-hot label. You would need to use the general cross-entropy function,

$$H(p,q)=-sum_xin X p(x) log q(x).$$
$ $

Note that one-hot labels would mean that
$$
p(x) =
begincases
1 & textif x text is the true label\
0 & textotherwise
endcases$$

which causes the cross-entropy $H(p,q)$ to reduce to the form you're familiar with:

$$H(p,q) = -log q(x_label)$$

answered Apr 14 at 22:38

Philip Raeisghasem

1,2401 silver badge24 bronze badges

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "658"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f11816%2fwhat-loss-function-to-use-when-labels-are-probabilities%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Actually, the cross-entropy loss function would be appropriate here, since it measures the "distance" between a distribution $q$ and the "true" distribution $p$.

$$H(p,q)=-sum_xin X p(x) log q(x).$$
$ $

Note that one-hot labels would mean that
$$
p(x) =
begincases
1 & textif x text is the true label\
0 & textotherwise
endcases$$

which causes the cross-entropy $H(p,q)$ to reduce to the form you're familiar with:

$$H(p,q) = -log q(x_label)$$

answered Apr 14 at 22:38

Philip Raeisghasem

1,2401 silver badge24 bronze badges

add a comment |

Actually, the cross-entropy loss function would be appropriate here, since it measures the "distance" between a distribution $q$ and the "true" distribution $p$.

$$H(p,q)=-sum_xin X p(x) log q(x).$$
$ $

Note that one-hot labels would mean that
$$
p(x) =
begincases
1 & textif x text is the true label\
0 & textotherwise
endcases$$

which causes the cross-entropy $H(p,q)$ to reduce to the form you're familiar with:

$$H(p,q) = -log q(x_label)$$

answered Apr 14 at 22:38

Philip Raeisghasem

1,2401 silver badge24 bronze badges

add a comment |

Actually, the cross-entropy loss function would be appropriate here, since it measures the "distance" between a distribution $q$ and the "true" distribution $p$.

$$H(p,q)=-sum_xin X p(x) log q(x).$$
$ $

Note that one-hot labels would mean that
$$
p(x) =
begincases
1 & textif x text is the true label\
0 & textotherwise
endcases$$

which causes the cross-entropy $H(p,q)$ to reduce to the form you're familiar with:

$$H(p,q) = -log q(x_label)$$

answered Apr 14 at 22:38

Philip Raeisghasem

1,2401 silver badge24 bronze badges

Actually, the cross-entropy loss function would be appropriate here, since it measures the "distance" between a distribution $q$ and the "true" distribution $p$.

$$H(p,q)=-sum_xin X p(x) log q(x).$$
$ $

Note that one-hot labels would mean that
$$
p(x) =
begincases
1 & textif x text is the true label\
0 & textotherwise
endcases$$

which causes the cross-entropy $H(p,q)$ to reduce to the form you're familiar with:

$$H(p,q) = -log q(x_label)$$

answered Apr 14 at 22:38

Philip Raeisghasem

1,2401 silver badge24 bronze badges

answered Apr 14 at 22:38

Philip Raeisghasem

1,2401 silver badge24 bronze badges

answered Apr 14 at 22:38

Philip Raeisghasem

1,2401 silver badge24 bronze badges

answered Apr 14 at 22:38

Philip Raeisghasem

1,2401 silver badge24 bronze badges

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Artificial Intelligence Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

yyH,JdIbbrDumAorXhEeiUINnpHiuwyuzefaKDvee54L0IUhdnxgB1KBbguUM9

搜尋此網誌

Bsrgvty

1 Answer
1

Your Answer

Post as a guest

1 Answer
1

1 Answer
1

Post as a guest

Popular posts from this blog

Tamil (spriik) Luke uk diar | Nawigatjuun

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tamil (spriik) Luke uk diar | Nawigatjuun

1 Answer
1

1 Answer
1

1 Answer
1