What loss function to use when labels are probabilities?Understanding GAN loss functionHelp with implementing Q-learning for a feedfoward network playing a video gameHow do I implement softmax forward propagation and backpropagation to replace sigmoid in a neural network?Should the input to the negative log likelihood loss function be probabilities?Gradient of hinge loss functionHow to understand marginal loglikelihood objective function as loss function (explanation of an article)?Loss function spikesWhat is the motivation for row-wise convolution and folding in Kalchbrenner et al. (2014)?Should I use the hyperbolic distance loss in the case of Poincarè Disk Model?Add a layer derivative in the loss function

Why do they not say "The Baby"

Why does the autopilot disengage even when it does not receive pilot input?

Cubic programming and beyond?

Ambiguous sentences: How to tell when they need fixing?

Players of unusual orchestral instruments

does ability to impeach an expert witness on science or scholarship go too far?

Military Weapon System

What is the German equivalent of 干物女 (dried fish woman)?

Draw 3D Cubes around centre

School House Points (Python + SQLite)

Bob's unnecessary trip to the shops

When did the Roman Empire fall according to contemporaries?

Why doesn't the Lars family (and thus Luke) speak Huttese as their first language?

Should you avoid redundant information after dialogue?

TikZ Can I draw an arrow by specifying the initial point, direction, and length?

What is this welding tool I found in my attic?

What is the commentary on Leviticus 21:2-4 - why is wife not included on the list

how can draw a kiviat diagram?

Won 50K! Now what should I do with it

Is killing off one of my queer characters homophobic?

How to repair a laptop's screen hinges?

Filtering fine silt/mud from water (not necessarily bacteria etc.)

Is this floating-point optimization allowed?

Did any of the founding fathers anticipate Lysander Spooner's criticism of the constitution?



What loss function to use when labels are probabilities?


Understanding GAN loss functionHelp with implementing Q-learning for a feedfoward network playing a video gameHow do I implement softmax forward propagation and backpropagation to replace sigmoid in a neural network?Should the input to the negative log likelihood loss function be probabilities?Gradient of hinge loss functionHow to understand marginal loglikelihood objective function as loss function (explanation of an article)?Loss function spikesWhat is the motivation for row-wise convolution and folding in Kalchbrenner et al. (2014)?Should I use the hyperbolic distance loss in the case of Poincarè Disk Model?Add a layer derivative in the loss function






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








4












$begingroup$


What loss function is most appropriate when training a model with target values that are probabilities? For example, I have a 3-output model. I want to train it with a feature vector $x=[x_1, x_2, dots, x_N]$ and a target $y=[0.2, 0.3, 0.5]$.



It seems like something like cross-entropy doesn't make sense here since it assumes that a single target is the correct label.



Would something like MSE (after applying softmax) make sense, or is there a better loss function?










share|improve this question











$endgroup$


















    4












    $begingroup$


    What loss function is most appropriate when training a model with target values that are probabilities? For example, I have a 3-output model. I want to train it with a feature vector $x=[x_1, x_2, dots, x_N]$ and a target $y=[0.2, 0.3, 0.5]$.



    It seems like something like cross-entropy doesn't make sense here since it assumes that a single target is the correct label.



    Would something like MSE (after applying softmax) make sense, or is there a better loss function?










    share|improve this question











    $endgroup$














      4












      4








      4


      1



      $begingroup$


      What loss function is most appropriate when training a model with target values that are probabilities? For example, I have a 3-output model. I want to train it with a feature vector $x=[x_1, x_2, dots, x_N]$ and a target $y=[0.2, 0.3, 0.5]$.



      It seems like something like cross-entropy doesn't make sense here since it assumes that a single target is the correct label.



      Would something like MSE (after applying softmax) make sense, or is there a better loss function?










      share|improve this question











      $endgroup$




      What loss function is most appropriate when training a model with target values that are probabilities? For example, I have a 3-output model. I want to train it with a feature vector $x=[x_1, x_2, dots, x_N]$ and a target $y=[0.2, 0.3, 0.5]$.



      It seems like something like cross-entropy doesn't make sense here since it assumes that a single target is the correct label.



      Would something like MSE (after applying softmax) make sense, or is there a better loss function?







      neural-networks machine-learning loss-functions probability-distribution






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Apr 15 at 10:11









      nbro

      5,3164 gold badges15 silver badges30 bronze badges




      5,3164 gold badges15 silver badges30 bronze badges










      asked Apr 14 at 22:13









      Thomas JohnsonThomas Johnson

      1233 bronze badges




      1233 bronze badges




















          1 Answer
          1






          active

          oldest

          votes


















          6












          $begingroup$

          Actually, the cross-entropy loss function would be appropriate here, since it measures the "distance" between a distribution $q$ and the "true" distribution $p$.



          You are right, though, that using a loss function called "cross_entropy" in many APIs would be a mistake. This is because these functions, as you said, assume a one-hot label. You would need to use the general cross-entropy function,



          $$H(p,q)=-sum_xin X p(x) log q(x).$$
          $ $



          Note that one-hot labels would mean that
          $$
          p(x) =
          begincases
          1 & textif x text is the true label\
          0 & textotherwise
          endcases$$



          which causes the cross-entropy $H(p,q)$ to reduce to the form you're familiar with:



          $$H(p,q) = -log q(x_label)$$






          share|improve this answer









          $endgroup$















            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "658"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f11816%2fwhat-loss-function-to-use-when-labels-are-probabilities%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            6












            $begingroup$

            Actually, the cross-entropy loss function would be appropriate here, since it measures the "distance" between a distribution $q$ and the "true" distribution $p$.



            You are right, though, that using a loss function called "cross_entropy" in many APIs would be a mistake. This is because these functions, as you said, assume a one-hot label. You would need to use the general cross-entropy function,



            $$H(p,q)=-sum_xin X p(x) log q(x).$$
            $ $



            Note that one-hot labels would mean that
            $$
            p(x) =
            begincases
            1 & textif x text is the true label\
            0 & textotherwise
            endcases$$



            which causes the cross-entropy $H(p,q)$ to reduce to the form you're familiar with:



            $$H(p,q) = -log q(x_label)$$






            share|improve this answer









            $endgroup$

















              6












              $begingroup$

              Actually, the cross-entropy loss function would be appropriate here, since it measures the "distance" between a distribution $q$ and the "true" distribution $p$.



              You are right, though, that using a loss function called "cross_entropy" in many APIs would be a mistake. This is because these functions, as you said, assume a one-hot label. You would need to use the general cross-entropy function,



              $$H(p,q)=-sum_xin X p(x) log q(x).$$
              $ $



              Note that one-hot labels would mean that
              $$
              p(x) =
              begincases
              1 & textif x text is the true label\
              0 & textotherwise
              endcases$$



              which causes the cross-entropy $H(p,q)$ to reduce to the form you're familiar with:



              $$H(p,q) = -log q(x_label)$$






              share|improve this answer









              $endgroup$















                6












                6








                6





                $begingroup$

                Actually, the cross-entropy loss function would be appropriate here, since it measures the "distance" between a distribution $q$ and the "true" distribution $p$.



                You are right, though, that using a loss function called "cross_entropy" in many APIs would be a mistake. This is because these functions, as you said, assume a one-hot label. You would need to use the general cross-entropy function,



                $$H(p,q)=-sum_xin X p(x) log q(x).$$
                $ $



                Note that one-hot labels would mean that
                $$
                p(x) =
                begincases
                1 & textif x text is the true label\
                0 & textotherwise
                endcases$$



                which causes the cross-entropy $H(p,q)$ to reduce to the form you're familiar with:



                $$H(p,q) = -log q(x_label)$$






                share|improve this answer









                $endgroup$



                Actually, the cross-entropy loss function would be appropriate here, since it measures the "distance" between a distribution $q$ and the "true" distribution $p$.



                You are right, though, that using a loss function called "cross_entropy" in many APIs would be a mistake. This is because these functions, as you said, assume a one-hot label. You would need to use the general cross-entropy function,



                $$H(p,q)=-sum_xin X p(x) log q(x).$$
                $ $



                Note that one-hot labels would mean that
                $$
                p(x) =
                begincases
                1 & textif x text is the true label\
                0 & textotherwise
                endcases$$



                which causes the cross-entropy $H(p,q)$ to reduce to the form you're familiar with:



                $$H(p,q) = -log q(x_label)$$







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Apr 14 at 22:38









                Philip RaeisghasemPhilip Raeisghasem

                1,2401 silver badge24 bronze badges




                1,2401 silver badge24 bronze badges



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Artificial Intelligence Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fai.stackexchange.com%2fquestions%2f11816%2fwhat-loss-function-to-use-when-labels-are-probabilities%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Distance measures on a map of a game The 2019 Stack Overflow Developer Survey Results Are Inmin distance in a graphShortest distance path on contour plotHow to plot a tilted map?Finding points outside of a diskDelaunay link distanceAnnulus from GeoDisks: drawing a ring on a mapNegative Correlation DistanceFind distance along a path (GPS coordinates)Finding position at given distance in a GeoPathMathematics behind distance estimation using camera

                    Genealogie vun de Merowenger Vum Merowech bis zum Chilperich I. | Navigatiounsmenü

                    How to get a smooth, uniform ParametricPlot of a 2D Region?How to plot a complicated Region?How to exclude a region from ParametricPlotHow discretize a region placing vertices on a specific non-uniform gridHow to transform a Plot or a ParametricPlot into a RegionHow can I get a smooth plot of a bounded region?Smooth ParametricPlot3D with RegionFunction?Smooth border of a region ParametricPlotSmooth region boundarySmooth region plot from list of pointsGet minimum y of a certain x in a region