How to measure the statistical “distance” between two frequency distributions?How to measure usage data across cohorts/segments?Multivariate nonparametric divergence (or distance) between distributionsIntuition on the Kullback-Leibler (KL) DivergenceComparing two word distributionsJensen-Shannon Divergence for multiple probability distributions?Distance measure between discrete distributions (that contains 0) and uniformCalculate the Kullback-Leibler Divergence for these 2 Gamma distributions

Are we sinners because we sin or do we sin because we are sinners?

Using characters to delimit commands (like markdown)

Days in indexed month

Integration of three dimensional function gives wrong answer!

Why are there so many binary systems?

Is "montäglich" commonly used?

How can I improve combat so my players don't always use the strategy of collectively focusing fire on one enemy at a time until it's dead?

Unique magic triplets

What is :>filename.txt Doing?

Does toddler keep hands around private parts?

What does this docker log entry mean?

Why is the core ChaCha primitive not good for use in a CRCF? Why create BLAKE?

Is the genre 'fantasy' still fantasy without magic?

how can traditional forms of magic compete against demon magic?

Did Ohio pass a law granting students the right to give scientifically wrong answers consistent with their religious beliefs?

Stack data structure in python 3

Is it a mistake to use a password that has previously been used (by anyone ever)?

SSH host identification changes on one wireless network

When can't Cramer-Rao lower bound be reached?

Elevator design implementation in C++

UK visitors visa needed fast for badly injured family member

How do critical hits work with static monster damage?

Can the Protection fighting style be used in this way?

If צחק refers to the 3 worst misdeeds why Itzchak was called so?



How to measure the statistical “distance” between two frequency distributions?


How to measure usage data across cohorts/segments?Multivariate nonparametric divergence (or distance) between distributionsIntuition on the Kullback-Leibler (KL) DivergenceComparing two word distributionsJensen-Shannon Divergence for multiple probability distributions?Distance measure between discrete distributions (that contains 0) and uniformCalculate the Kullback-Leibler Divergence for these 2 Gamma distributions






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;









14















$begingroup$


I am undertaking a data analysis project which involves investigating website usage times over the course of the year. What I would like to do is compare how "consistent" the usage patterns are, say, how close they are to a pattern which involves using it for 1 hour once per week, or one which involves using it for 10 minutes a time, 6 times per week. I am aware of several things which can be calculated:




  • Shannon entropy: measures how much the "certainty" in the outcome differs, i.e. how much a probability distribution differs from one that is uniform;


  • Kullback-Liebler divergence: measures how much one probability distribution differs from another


  • Jensen-Shannon divergence: similar to the KL-divergence, but more useful as it returns finite values


  • Smirnov-Kolmogorov test: a test to determine whether two cumulative distribution functions for continuous random variables come from the same sample.


  • Chi-squared test: a goodness-of-fit test to decide how well a frequency distribution differs from an expected frequency distribution.

What I would like to do is compare how much the actual usage durations (blue) differ from ideal usage times (orange) in distribution. These distributions are discrete, and the versions below are normalised to become probability distributions. The horizontal axis represents the amount of time (in minutes) a user has spent on the website; this has been recorded for each day of the year; if the user has not gone on the website at all then this counts as a zero duration but these have been removed from the frequency distribution. On the right is the cumulative distribution function.



Distribution of website usage data versus ideal usage data



My only problem is, even though I can get the JS-divergence to return a finite value, when I look at different users and compare their usage distributions to the ideal one, I get values that are mostly identical (which is therefore not a good indicator of how much they differ). Also, quite a bit of information is lost when normalising to probability distributions rather than frequency distributions (say a student uses the platform 50 times, then the blue distribution should be vertically scaled so that the total of the lengths of the bars equals 50, and the orange bar should have a height of 50 rather than 1). Part of what we mean by "consistency" is whether how often a user goes on the website affects how much they get out of it; if the number of times they visit the website is lost then comparing probability distributions is a bit dubious; even if the probability distribution of a user's duration is close to the "ideal" usage, that user may only have used the platform for 1 week during the year, which arguably is not very consistent.



Are there any well-established techniques for comparing two frequency distributions and calculating some sort of metric which characterises how similar (or dissimilar) they are?










share|cite|improve this question











$endgroup$










  • 4




    $begingroup$
    You might want to start by asking yourself what your loss function is (that is, in what way is usage pattern differing from the ideal bad, and how does the amount of badness depend on what sort of divergence there is), and designing your metric around that.
    $endgroup$
    – Acccumulation
    Sep 5 at 17:37

















14















$begingroup$


I am undertaking a data analysis project which involves investigating website usage times over the course of the year. What I would like to do is compare how "consistent" the usage patterns are, say, how close they are to a pattern which involves using it for 1 hour once per week, or one which involves using it for 10 minutes a time, 6 times per week. I am aware of several things which can be calculated:




  • Shannon entropy: measures how much the "certainty" in the outcome differs, i.e. how much a probability distribution differs from one that is uniform;


  • Kullback-Liebler divergence: measures how much one probability distribution differs from another


  • Jensen-Shannon divergence: similar to the KL-divergence, but more useful as it returns finite values


  • Smirnov-Kolmogorov test: a test to determine whether two cumulative distribution functions for continuous random variables come from the same sample.


  • Chi-squared test: a goodness-of-fit test to decide how well a frequency distribution differs from an expected frequency distribution.

What I would like to do is compare how much the actual usage durations (blue) differ from ideal usage times (orange) in distribution. These distributions are discrete, and the versions below are normalised to become probability distributions. The horizontal axis represents the amount of time (in minutes) a user has spent on the website; this has been recorded for each day of the year; if the user has not gone on the website at all then this counts as a zero duration but these have been removed from the frequency distribution. On the right is the cumulative distribution function.



Distribution of website usage data versus ideal usage data



My only problem is, even though I can get the JS-divergence to return a finite value, when I look at different users and compare their usage distributions to the ideal one, I get values that are mostly identical (which is therefore not a good indicator of how much they differ). Also, quite a bit of information is lost when normalising to probability distributions rather than frequency distributions (say a student uses the platform 50 times, then the blue distribution should be vertically scaled so that the total of the lengths of the bars equals 50, and the orange bar should have a height of 50 rather than 1). Part of what we mean by "consistency" is whether how often a user goes on the website affects how much they get out of it; if the number of times they visit the website is lost then comparing probability distributions is a bit dubious; even if the probability distribution of a user's duration is close to the "ideal" usage, that user may only have used the platform for 1 week during the year, which arguably is not very consistent.



Are there any well-established techniques for comparing two frequency distributions and calculating some sort of metric which characterises how similar (or dissimilar) they are?










share|cite|improve this question











$endgroup$










  • 4




    $begingroup$
    You might want to start by asking yourself what your loss function is (that is, in what way is usage pattern differing from the ideal bad, and how does the amount of badness depend on what sort of divergence there is), and designing your metric around that.
    $endgroup$
    – Acccumulation
    Sep 5 at 17:37













14













14









14


8



$begingroup$


I am undertaking a data analysis project which involves investigating website usage times over the course of the year. What I would like to do is compare how "consistent" the usage patterns are, say, how close they are to a pattern which involves using it for 1 hour once per week, or one which involves using it for 10 minutes a time, 6 times per week. I am aware of several things which can be calculated:




  • Shannon entropy: measures how much the "certainty" in the outcome differs, i.e. how much a probability distribution differs from one that is uniform;


  • Kullback-Liebler divergence: measures how much one probability distribution differs from another


  • Jensen-Shannon divergence: similar to the KL-divergence, but more useful as it returns finite values


  • Smirnov-Kolmogorov test: a test to determine whether two cumulative distribution functions for continuous random variables come from the same sample.


  • Chi-squared test: a goodness-of-fit test to decide how well a frequency distribution differs from an expected frequency distribution.

What I would like to do is compare how much the actual usage durations (blue) differ from ideal usage times (orange) in distribution. These distributions are discrete, and the versions below are normalised to become probability distributions. The horizontal axis represents the amount of time (in minutes) a user has spent on the website; this has been recorded for each day of the year; if the user has not gone on the website at all then this counts as a zero duration but these have been removed from the frequency distribution. On the right is the cumulative distribution function.



Distribution of website usage data versus ideal usage data



My only problem is, even though I can get the JS-divergence to return a finite value, when I look at different users and compare their usage distributions to the ideal one, I get values that are mostly identical (which is therefore not a good indicator of how much they differ). Also, quite a bit of information is lost when normalising to probability distributions rather than frequency distributions (say a student uses the platform 50 times, then the blue distribution should be vertically scaled so that the total of the lengths of the bars equals 50, and the orange bar should have a height of 50 rather than 1). Part of what we mean by "consistency" is whether how often a user goes on the website affects how much they get out of it; if the number of times they visit the website is lost then comparing probability distributions is a bit dubious; even if the probability distribution of a user's duration is close to the "ideal" usage, that user may only have used the platform for 1 week during the year, which arguably is not very consistent.



Are there any well-established techniques for comparing two frequency distributions and calculating some sort of metric which characterises how similar (or dissimilar) they are?










share|cite|improve this question











$endgroup$




I am undertaking a data analysis project which involves investigating website usage times over the course of the year. What I would like to do is compare how "consistent" the usage patterns are, say, how close they are to a pattern which involves using it for 1 hour once per week, or one which involves using it for 10 minutes a time, 6 times per week. I am aware of several things which can be calculated:




  • Shannon entropy: measures how much the "certainty" in the outcome differs, i.e. how much a probability distribution differs from one that is uniform;


  • Kullback-Liebler divergence: measures how much one probability distribution differs from another


  • Jensen-Shannon divergence: similar to the KL-divergence, but more useful as it returns finite values


  • Smirnov-Kolmogorov test: a test to determine whether two cumulative distribution functions for continuous random variables come from the same sample.


  • Chi-squared test: a goodness-of-fit test to decide how well a frequency distribution differs from an expected frequency distribution.

What I would like to do is compare how much the actual usage durations (blue) differ from ideal usage times (orange) in distribution. These distributions are discrete, and the versions below are normalised to become probability distributions. The horizontal axis represents the amount of time (in minutes) a user has spent on the website; this has been recorded for each day of the year; if the user has not gone on the website at all then this counts as a zero duration but these have been removed from the frequency distribution. On the right is the cumulative distribution function.



Distribution of website usage data versus ideal usage data



My only problem is, even though I can get the JS-divergence to return a finite value, when I look at different users and compare their usage distributions to the ideal one, I get values that are mostly identical (which is therefore not a good indicator of how much they differ). Also, quite a bit of information is lost when normalising to probability distributions rather than frequency distributions (say a student uses the platform 50 times, then the blue distribution should be vertically scaled so that the total of the lengths of the bars equals 50, and the orange bar should have a height of 50 rather than 1). Part of what we mean by "consistency" is whether how often a user goes on the website affects how much they get out of it; if the number of times they visit the website is lost then comparing probability distributions is a bit dubious; even if the probability distribution of a user's duration is close to the "ideal" usage, that user may only have used the platform for 1 week during the year, which arguably is not very consistent.



Are there any well-established techniques for comparing two frequency distributions and calculating some sort of metric which characterises how similar (or dissimilar) they are?







distributions distance frequency comparison






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Sep 5 at 8:38







omegaSQU4RED

















asked Sep 5 at 8:30









omegaSQU4REDomegaSQU4RED

2414 bronze badges




2414 bronze badges










  • 4




    $begingroup$
    You might want to start by asking yourself what your loss function is (that is, in what way is usage pattern differing from the ideal bad, and how does the amount of badness depend on what sort of divergence there is), and designing your metric around that.
    $endgroup$
    – Acccumulation
    Sep 5 at 17:37












  • 4




    $begingroup$
    You might want to start by asking yourself what your loss function is (that is, in what way is usage pattern differing from the ideal bad, and how does the amount of badness depend on what sort of divergence there is), and designing your metric around that.
    $endgroup$
    – Acccumulation
    Sep 5 at 17:37







4




4




$begingroup$
You might want to start by asking yourself what your loss function is (that is, in what way is usage pattern differing from the ideal bad, and how does the amount of badness depend on what sort of divergence there is), and designing your metric around that.
$endgroup$
– Acccumulation
Sep 5 at 17:37




$begingroup$
You might want to start by asking yourself what your loss function is (that is, in what way is usage pattern differing from the ideal bad, and how does the amount of badness depend on what sort of divergence there is), and designing your metric around that.
$endgroup$
– Acccumulation
Sep 5 at 17:37










3 Answers
3






active

oldest

votes


















12

















$begingroup$

You may be interested in the Earth mover's distance, also known as the Wasserstein metric. It is implemented in R (look at the emdist package) and in Python. We also have a number of threads on it.



The EMD works for both continuous and discrete distributions. The emdist package for R works on discrete distributions.



The advantage over something like a $chi^2$ statistic is that the EMD yields interpretable results. Picture your distribution as mounds of earth, then the EMD tells you how much earth you would need to transport how far to turn one distribution into the other.



Put another way: two distributions (1,0,0) and (0,1,0) should be "more similar" than (1,0,0) and (0,0,1). The EMD will recognize this and assign a smaller distance to the first pair than to the second. The $chi^2$ statistic will assign the same distance to both pairs, because it has no notion of an ordering in the distribution entries.






share|cite|improve this answer












$endgroup$














  • $begingroup$
    Why that particular distance? It seems to be designed for any continuous distribution. OP has a frequency distribution, so why not a more "discrete" distance such as Chi-square?
    $endgroup$
    – user2974951
    Sep 5 at 8:47










  • $begingroup$
    @user2974951: fair enough. See my edit.
    $endgroup$
    – S. Kolassa - Reinstate Monica
    Sep 5 at 8:56










  • $begingroup$
    if distributions are 1D, as suggested in the question, then running and EMD solver is overkill. What needs to be done in that case is just compute a $L_p$ metric between the empirical quantile functions of both densities (essentially the inverses of the CDFs plotted in the quesion). p.31 in arxiv.org/abs/1803.00567 or if you need a more detailed account chapter 2 of math.u-psud.fr/~filippo/OTAM-cvgmt.pdf
    $endgroup$
    – Marco Cuturi
    Sep 10 at 20:42











  • $begingroup$
    @MarcoCuturi: an $L^p$ distance is certainly another possibility. However, it will again assign the same distance between (1,0,0) and (0,1,0) as between (1,0,0) and (0,0,1), which is a bit unintuitive. Whether the added intuitive niceness of the EMD is worth the additional complexity is something the OP can consider.
    $endgroup$
    – S. Kolassa - Reinstate Monica
    Sep 11 at 8:57


















3

















$begingroup$

If you randomly sample an individual from each of the two distributions, you can calculate a difference between them. If you repeat this (with replacement) a number of times, you can generate a distribution of differences that contains all the information you are after. You can plot this distribution and characterize it with any summary statistics you desire - means, medians, etc.






share|cite|improve this answer










$endgroup$










  • 1




    $begingroup$
    Is there a name for such a procedure?
    $endgroup$
    – user2974951
    Sep 5 at 8:55






  • 4




    $begingroup$
    I wonder how one could account for the baseline fact that the distribution of differences for an arbitrary distribution and itself will be different for different arbitrary distributions; think U(0,1) vs. itself compared to N(0,1) vs. itself. Hence, the distribution of differences you would get comparing two different distributions would be hard to evaluate in absence of a unique baseline. The problem disappears if the observations are paired, then the baseline would be a unit mass at zero.
    $endgroup$
    – Richard Hardy
    Sep 5 at 10:44











  • $begingroup$
    @user2974951 I'm sure there is, as it's pretty simple and clearly related to bootstrapping. But I don't know what to call it precisely.
    $endgroup$
    – mkt - Reinstate Monica
    Sep 5 at 11:02






  • 1




    $begingroup$
    @mkt, thanks for your clarification. Without intending to argue just for the sake it, I still think that without a unique baseline we do not quite have a ruler. But I will leave it at that. There is something nice about your idea anyway.
    $endgroup$
    – Richard Hardy
    Sep 5 at 15:18







  • 1




    $begingroup$
    @RichardHardy I appreciate the exchange here, and you might well be right. I'll have to think about this further.
    $endgroup$
    – mkt - Reinstate Monica
    Sep 5 at 15:42


















0

















$begingroup$

One of the metric is Hellinger distance between two distributions which are characterised by means and standard deviations. The application can be found in the following article.



https://www.sciencedirect.com/science/article/pii/S1568494615005104






share|cite|improve this answer










$endgroup$














  • $begingroup$
    Thanks for that. I have seen that there is a whole family of divergences (f-divergences) which do what I want, but a quick flick through the literature doesn't seem to indicate which is best when... do you know of any good literature on this?
    $endgroup$
    – omegaSQU4RED
    Sep 11 at 13:17












Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);














draft saved

draft discarded
















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f425040%2fhow-to-measure-the-statistical-distance-between-two-frequency-distributions%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown


























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes









12

















$begingroup$

You may be interested in the Earth mover's distance, also known as the Wasserstein metric. It is implemented in R (look at the emdist package) and in Python. We also have a number of threads on it.



The EMD works for both continuous and discrete distributions. The emdist package for R works on discrete distributions.



The advantage over something like a $chi^2$ statistic is that the EMD yields interpretable results. Picture your distribution as mounds of earth, then the EMD tells you how much earth you would need to transport how far to turn one distribution into the other.



Put another way: two distributions (1,0,0) and (0,1,0) should be "more similar" than (1,0,0) and (0,0,1). The EMD will recognize this and assign a smaller distance to the first pair than to the second. The $chi^2$ statistic will assign the same distance to both pairs, because it has no notion of an ordering in the distribution entries.






share|cite|improve this answer












$endgroup$














  • $begingroup$
    Why that particular distance? It seems to be designed for any continuous distribution. OP has a frequency distribution, so why not a more "discrete" distance such as Chi-square?
    $endgroup$
    – user2974951
    Sep 5 at 8:47










  • $begingroup$
    @user2974951: fair enough. See my edit.
    $endgroup$
    – S. Kolassa - Reinstate Monica
    Sep 5 at 8:56










  • $begingroup$
    if distributions are 1D, as suggested in the question, then running and EMD solver is overkill. What needs to be done in that case is just compute a $L_p$ metric between the empirical quantile functions of both densities (essentially the inverses of the CDFs plotted in the quesion). p.31 in arxiv.org/abs/1803.00567 or if you need a more detailed account chapter 2 of math.u-psud.fr/~filippo/OTAM-cvgmt.pdf
    $endgroup$
    – Marco Cuturi
    Sep 10 at 20:42











  • $begingroup$
    @MarcoCuturi: an $L^p$ distance is certainly another possibility. However, it will again assign the same distance between (1,0,0) and (0,1,0) as between (1,0,0) and (0,0,1), which is a bit unintuitive. Whether the added intuitive niceness of the EMD is worth the additional complexity is something the OP can consider.
    $endgroup$
    – S. Kolassa - Reinstate Monica
    Sep 11 at 8:57















12

















$begingroup$

You may be interested in the Earth mover's distance, also known as the Wasserstein metric. It is implemented in R (look at the emdist package) and in Python. We also have a number of threads on it.



The EMD works for both continuous and discrete distributions. The emdist package for R works on discrete distributions.



The advantage over something like a $chi^2$ statistic is that the EMD yields interpretable results. Picture your distribution as mounds of earth, then the EMD tells you how much earth you would need to transport how far to turn one distribution into the other.



Put another way: two distributions (1,0,0) and (0,1,0) should be "more similar" than (1,0,0) and (0,0,1). The EMD will recognize this and assign a smaller distance to the first pair than to the second. The $chi^2$ statistic will assign the same distance to both pairs, because it has no notion of an ordering in the distribution entries.






share|cite|improve this answer












$endgroup$














  • $begingroup$
    Why that particular distance? It seems to be designed for any continuous distribution. OP has a frequency distribution, so why not a more "discrete" distance such as Chi-square?
    $endgroup$
    – user2974951
    Sep 5 at 8:47










  • $begingroup$
    @user2974951: fair enough. See my edit.
    $endgroup$
    – S. Kolassa - Reinstate Monica
    Sep 5 at 8:56










  • $begingroup$
    if distributions are 1D, as suggested in the question, then running and EMD solver is overkill. What needs to be done in that case is just compute a $L_p$ metric between the empirical quantile functions of both densities (essentially the inverses of the CDFs plotted in the quesion). p.31 in arxiv.org/abs/1803.00567 or if you need a more detailed account chapter 2 of math.u-psud.fr/~filippo/OTAM-cvgmt.pdf
    $endgroup$
    – Marco Cuturi
    Sep 10 at 20:42











  • $begingroup$
    @MarcoCuturi: an $L^p$ distance is certainly another possibility. However, it will again assign the same distance between (1,0,0) and (0,1,0) as between (1,0,0) and (0,0,1), which is a bit unintuitive. Whether the added intuitive niceness of the EMD is worth the additional complexity is something the OP can consider.
    $endgroup$
    – S. Kolassa - Reinstate Monica
    Sep 11 at 8:57













12















12











12







$begingroup$

You may be interested in the Earth mover's distance, also known as the Wasserstein metric. It is implemented in R (look at the emdist package) and in Python. We also have a number of threads on it.



The EMD works for both continuous and discrete distributions. The emdist package for R works on discrete distributions.



The advantage over something like a $chi^2$ statistic is that the EMD yields interpretable results. Picture your distribution as mounds of earth, then the EMD tells you how much earth you would need to transport how far to turn one distribution into the other.



Put another way: two distributions (1,0,0) and (0,1,0) should be "more similar" than (1,0,0) and (0,0,1). The EMD will recognize this and assign a smaller distance to the first pair than to the second. The $chi^2$ statistic will assign the same distance to both pairs, because it has no notion of an ordering in the distribution entries.






share|cite|improve this answer












$endgroup$



You may be interested in the Earth mover's distance, also known as the Wasserstein metric. It is implemented in R (look at the emdist package) and in Python. We also have a number of threads on it.



The EMD works for both continuous and discrete distributions. The emdist package for R works on discrete distributions.



The advantage over something like a $chi^2$ statistic is that the EMD yields interpretable results. Picture your distribution as mounds of earth, then the EMD tells you how much earth you would need to transport how far to turn one distribution into the other.



Put another way: two distributions (1,0,0) and (0,1,0) should be "more similar" than (1,0,0) and (0,0,1). The EMD will recognize this and assign a smaller distance to the first pair than to the second. The $chi^2$ statistic will assign the same distance to both pairs, because it has no notion of an ordering in the distribution entries.







share|cite|improve this answer















share|cite|improve this answer




share|cite|improve this answer








edited Sep 5 at 8:56

























answered Sep 5 at 8:44









S. Kolassa - Reinstate MonicaS. Kolassa - Reinstate Monica

58.9k10 gold badges113 silver badges216 bronze badges




58.9k10 gold badges113 silver badges216 bronze badges














  • $begingroup$
    Why that particular distance? It seems to be designed for any continuous distribution. OP has a frequency distribution, so why not a more "discrete" distance such as Chi-square?
    $endgroup$
    – user2974951
    Sep 5 at 8:47










  • $begingroup$
    @user2974951: fair enough. See my edit.
    $endgroup$
    – S. Kolassa - Reinstate Monica
    Sep 5 at 8:56










  • $begingroup$
    if distributions are 1D, as suggested in the question, then running and EMD solver is overkill. What needs to be done in that case is just compute a $L_p$ metric between the empirical quantile functions of both densities (essentially the inverses of the CDFs plotted in the quesion). p.31 in arxiv.org/abs/1803.00567 or if you need a more detailed account chapter 2 of math.u-psud.fr/~filippo/OTAM-cvgmt.pdf
    $endgroup$
    – Marco Cuturi
    Sep 10 at 20:42











  • $begingroup$
    @MarcoCuturi: an $L^p$ distance is certainly another possibility. However, it will again assign the same distance between (1,0,0) and (0,1,0) as between (1,0,0) and (0,0,1), which is a bit unintuitive. Whether the added intuitive niceness of the EMD is worth the additional complexity is something the OP can consider.
    $endgroup$
    – S. Kolassa - Reinstate Monica
    Sep 11 at 8:57
















  • $begingroup$
    Why that particular distance? It seems to be designed for any continuous distribution. OP has a frequency distribution, so why not a more "discrete" distance such as Chi-square?
    $endgroup$
    – user2974951
    Sep 5 at 8:47










  • $begingroup$
    @user2974951: fair enough. See my edit.
    $endgroup$
    – S. Kolassa - Reinstate Monica
    Sep 5 at 8:56










  • $begingroup$
    if distributions are 1D, as suggested in the question, then running and EMD solver is overkill. What needs to be done in that case is just compute a $L_p$ metric between the empirical quantile functions of both densities (essentially the inverses of the CDFs plotted in the quesion). p.31 in arxiv.org/abs/1803.00567 or if you need a more detailed account chapter 2 of math.u-psud.fr/~filippo/OTAM-cvgmt.pdf
    $endgroup$
    – Marco Cuturi
    Sep 10 at 20:42











  • $begingroup$
    @MarcoCuturi: an $L^p$ distance is certainly another possibility. However, it will again assign the same distance between (1,0,0) and (0,1,0) as between (1,0,0) and (0,0,1), which is a bit unintuitive. Whether the added intuitive niceness of the EMD is worth the additional complexity is something the OP can consider.
    $endgroup$
    – S. Kolassa - Reinstate Monica
    Sep 11 at 8:57















$begingroup$
Why that particular distance? It seems to be designed for any continuous distribution. OP has a frequency distribution, so why not a more "discrete" distance such as Chi-square?
$endgroup$
– user2974951
Sep 5 at 8:47




$begingroup$
Why that particular distance? It seems to be designed for any continuous distribution. OP has a frequency distribution, so why not a more "discrete" distance such as Chi-square?
$endgroup$
– user2974951
Sep 5 at 8:47












$begingroup$
@user2974951: fair enough. See my edit.
$endgroup$
– S. Kolassa - Reinstate Monica
Sep 5 at 8:56




$begingroup$
@user2974951: fair enough. See my edit.
$endgroup$
– S. Kolassa - Reinstate Monica
Sep 5 at 8:56












$begingroup$
if distributions are 1D, as suggested in the question, then running and EMD solver is overkill. What needs to be done in that case is just compute a $L_p$ metric between the empirical quantile functions of both densities (essentially the inverses of the CDFs plotted in the quesion). p.31 in arxiv.org/abs/1803.00567 or if you need a more detailed account chapter 2 of math.u-psud.fr/~filippo/OTAM-cvgmt.pdf
$endgroup$
– Marco Cuturi
Sep 10 at 20:42





$begingroup$
if distributions are 1D, as suggested in the question, then running and EMD solver is overkill. What needs to be done in that case is just compute a $L_p$ metric between the empirical quantile functions of both densities (essentially the inverses of the CDFs plotted in the quesion). p.31 in arxiv.org/abs/1803.00567 or if you need a more detailed account chapter 2 of math.u-psud.fr/~filippo/OTAM-cvgmt.pdf
$endgroup$
– Marco Cuturi
Sep 10 at 20:42













$begingroup$
@MarcoCuturi: an $L^p$ distance is certainly another possibility. However, it will again assign the same distance between (1,0,0) and (0,1,0) as between (1,0,0) and (0,0,1), which is a bit unintuitive. Whether the added intuitive niceness of the EMD is worth the additional complexity is something the OP can consider.
$endgroup$
– S. Kolassa - Reinstate Monica
Sep 11 at 8:57




$begingroup$
@MarcoCuturi: an $L^p$ distance is certainly another possibility. However, it will again assign the same distance between (1,0,0) and (0,1,0) as between (1,0,0) and (0,0,1), which is a bit unintuitive. Whether the added intuitive niceness of the EMD is worth the additional complexity is something the OP can consider.
$endgroup$
– S. Kolassa - Reinstate Monica
Sep 11 at 8:57













3

















$begingroup$

If you randomly sample an individual from each of the two distributions, you can calculate a difference between them. If you repeat this (with replacement) a number of times, you can generate a distribution of differences that contains all the information you are after. You can plot this distribution and characterize it with any summary statistics you desire - means, medians, etc.






share|cite|improve this answer










$endgroup$










  • 1




    $begingroup$
    Is there a name for such a procedure?
    $endgroup$
    – user2974951
    Sep 5 at 8:55






  • 4




    $begingroup$
    I wonder how one could account for the baseline fact that the distribution of differences for an arbitrary distribution and itself will be different for different arbitrary distributions; think U(0,1) vs. itself compared to N(0,1) vs. itself. Hence, the distribution of differences you would get comparing two different distributions would be hard to evaluate in absence of a unique baseline. The problem disappears if the observations are paired, then the baseline would be a unit mass at zero.
    $endgroup$
    – Richard Hardy
    Sep 5 at 10:44











  • $begingroup$
    @user2974951 I'm sure there is, as it's pretty simple and clearly related to bootstrapping. But I don't know what to call it precisely.
    $endgroup$
    – mkt - Reinstate Monica
    Sep 5 at 11:02






  • 1




    $begingroup$
    @mkt, thanks for your clarification. Without intending to argue just for the sake it, I still think that without a unique baseline we do not quite have a ruler. But I will leave it at that. There is something nice about your idea anyway.
    $endgroup$
    – Richard Hardy
    Sep 5 at 15:18







  • 1




    $begingroup$
    @RichardHardy I appreciate the exchange here, and you might well be right. I'll have to think about this further.
    $endgroup$
    – mkt - Reinstate Monica
    Sep 5 at 15:42















3

















$begingroup$

If you randomly sample an individual from each of the two distributions, you can calculate a difference between them. If you repeat this (with replacement) a number of times, you can generate a distribution of differences that contains all the information you are after. You can plot this distribution and characterize it with any summary statistics you desire - means, medians, etc.






share|cite|improve this answer










$endgroup$










  • 1




    $begingroup$
    Is there a name for such a procedure?
    $endgroup$
    – user2974951
    Sep 5 at 8:55






  • 4




    $begingroup$
    I wonder how one could account for the baseline fact that the distribution of differences for an arbitrary distribution and itself will be different for different arbitrary distributions; think U(0,1) vs. itself compared to N(0,1) vs. itself. Hence, the distribution of differences you would get comparing two different distributions would be hard to evaluate in absence of a unique baseline. The problem disappears if the observations are paired, then the baseline would be a unit mass at zero.
    $endgroup$
    – Richard Hardy
    Sep 5 at 10:44











  • $begingroup$
    @user2974951 I'm sure there is, as it's pretty simple and clearly related to bootstrapping. But I don't know what to call it precisely.
    $endgroup$
    – mkt - Reinstate Monica
    Sep 5 at 11:02






  • 1




    $begingroup$
    @mkt, thanks for your clarification. Without intending to argue just for the sake it, I still think that without a unique baseline we do not quite have a ruler. But I will leave it at that. There is something nice about your idea anyway.
    $endgroup$
    – Richard Hardy
    Sep 5 at 15:18







  • 1




    $begingroup$
    @RichardHardy I appreciate the exchange here, and you might well be right. I'll have to think about this further.
    $endgroup$
    – mkt - Reinstate Monica
    Sep 5 at 15:42













3















3











3







$begingroup$

If you randomly sample an individual from each of the two distributions, you can calculate a difference between them. If you repeat this (with replacement) a number of times, you can generate a distribution of differences that contains all the information you are after. You can plot this distribution and characterize it with any summary statistics you desire - means, medians, etc.






share|cite|improve this answer










$endgroup$



If you randomly sample an individual from each of the two distributions, you can calculate a difference between them. If you repeat this (with replacement) a number of times, you can generate a distribution of differences that contains all the information you are after. You can plot this distribution and characterize it with any summary statistics you desire - means, medians, etc.







share|cite|improve this answer













share|cite|improve this answer




share|cite|improve this answer










answered Sep 5 at 8:47









mkt - Reinstate Monicamkt - Reinstate Monica

9,6866 gold badges33 silver badges96 bronze badges




9,6866 gold badges33 silver badges96 bronze badges










  • 1




    $begingroup$
    Is there a name for such a procedure?
    $endgroup$
    – user2974951
    Sep 5 at 8:55






  • 4




    $begingroup$
    I wonder how one could account for the baseline fact that the distribution of differences for an arbitrary distribution and itself will be different for different arbitrary distributions; think U(0,1) vs. itself compared to N(0,1) vs. itself. Hence, the distribution of differences you would get comparing two different distributions would be hard to evaluate in absence of a unique baseline. The problem disappears if the observations are paired, then the baseline would be a unit mass at zero.
    $endgroup$
    – Richard Hardy
    Sep 5 at 10:44











  • $begingroup$
    @user2974951 I'm sure there is, as it's pretty simple and clearly related to bootstrapping. But I don't know what to call it precisely.
    $endgroup$
    – mkt - Reinstate Monica
    Sep 5 at 11:02






  • 1




    $begingroup$
    @mkt, thanks for your clarification. Without intending to argue just for the sake it, I still think that without a unique baseline we do not quite have a ruler. But I will leave it at that. There is something nice about your idea anyway.
    $endgroup$
    – Richard Hardy
    Sep 5 at 15:18







  • 1




    $begingroup$
    @RichardHardy I appreciate the exchange here, and you might well be right. I'll have to think about this further.
    $endgroup$
    – mkt - Reinstate Monica
    Sep 5 at 15:42












  • 1




    $begingroup$
    Is there a name for such a procedure?
    $endgroup$
    – user2974951
    Sep 5 at 8:55






  • 4




    $begingroup$
    I wonder how one could account for the baseline fact that the distribution of differences for an arbitrary distribution and itself will be different for different arbitrary distributions; think U(0,1) vs. itself compared to N(0,1) vs. itself. Hence, the distribution of differences you would get comparing two different distributions would be hard to evaluate in absence of a unique baseline. The problem disappears if the observations are paired, then the baseline would be a unit mass at zero.
    $endgroup$
    – Richard Hardy
    Sep 5 at 10:44











  • $begingroup$
    @user2974951 I'm sure there is, as it's pretty simple and clearly related to bootstrapping. But I don't know what to call it precisely.
    $endgroup$
    – mkt - Reinstate Monica
    Sep 5 at 11:02






  • 1




    $begingroup$
    @mkt, thanks for your clarification. Without intending to argue just for the sake it, I still think that without a unique baseline we do not quite have a ruler. But I will leave it at that. There is something nice about your idea anyway.
    $endgroup$
    – Richard Hardy
    Sep 5 at 15:18







  • 1




    $begingroup$
    @RichardHardy I appreciate the exchange here, and you might well be right. I'll have to think about this further.
    $endgroup$
    – mkt - Reinstate Monica
    Sep 5 at 15:42







1




1




$begingroup$
Is there a name for such a procedure?
$endgroup$
– user2974951
Sep 5 at 8:55




$begingroup$
Is there a name for such a procedure?
$endgroup$
– user2974951
Sep 5 at 8:55




4




4




$begingroup$
I wonder how one could account for the baseline fact that the distribution of differences for an arbitrary distribution and itself will be different for different arbitrary distributions; think U(0,1) vs. itself compared to N(0,1) vs. itself. Hence, the distribution of differences you would get comparing two different distributions would be hard to evaluate in absence of a unique baseline. The problem disappears if the observations are paired, then the baseline would be a unit mass at zero.
$endgroup$
– Richard Hardy
Sep 5 at 10:44





$begingroup$
I wonder how one could account for the baseline fact that the distribution of differences for an arbitrary distribution and itself will be different for different arbitrary distributions; think U(0,1) vs. itself compared to N(0,1) vs. itself. Hence, the distribution of differences you would get comparing two different distributions would be hard to evaluate in absence of a unique baseline. The problem disappears if the observations are paired, then the baseline would be a unit mass at zero.
$endgroup$
– Richard Hardy
Sep 5 at 10:44













$begingroup$
@user2974951 I'm sure there is, as it's pretty simple and clearly related to bootstrapping. But I don't know what to call it precisely.
$endgroup$
– mkt - Reinstate Monica
Sep 5 at 11:02




$begingroup$
@user2974951 I'm sure there is, as it's pretty simple and clearly related to bootstrapping. But I don't know what to call it precisely.
$endgroup$
– mkt - Reinstate Monica
Sep 5 at 11:02




1




1




$begingroup$
@mkt, thanks for your clarification. Without intending to argue just for the sake it, I still think that without a unique baseline we do not quite have a ruler. But I will leave it at that. There is something nice about your idea anyway.
$endgroup$
– Richard Hardy
Sep 5 at 15:18





$begingroup$
@mkt, thanks for your clarification. Without intending to argue just for the sake it, I still think that without a unique baseline we do not quite have a ruler. But I will leave it at that. There is something nice about your idea anyway.
$endgroup$
– Richard Hardy
Sep 5 at 15:18





1




1




$begingroup$
@RichardHardy I appreciate the exchange here, and you might well be right. I'll have to think about this further.
$endgroup$
– mkt - Reinstate Monica
Sep 5 at 15:42




$begingroup$
@RichardHardy I appreciate the exchange here, and you might well be right. I'll have to think about this further.
$endgroup$
– mkt - Reinstate Monica
Sep 5 at 15:42











0

















$begingroup$

One of the metric is Hellinger distance between two distributions which are characterised by means and standard deviations. The application can be found in the following article.



https://www.sciencedirect.com/science/article/pii/S1568494615005104






share|cite|improve this answer










$endgroup$














  • $begingroup$
    Thanks for that. I have seen that there is a whole family of divergences (f-divergences) which do what I want, but a quick flick through the literature doesn't seem to indicate which is best when... do you know of any good literature on this?
    $endgroup$
    – omegaSQU4RED
    Sep 11 at 13:17















0

















$begingroup$

One of the metric is Hellinger distance between two distributions which are characterised by means and standard deviations. The application can be found in the following article.



https://www.sciencedirect.com/science/article/pii/S1568494615005104






share|cite|improve this answer










$endgroup$














  • $begingroup$
    Thanks for that. I have seen that there is a whole family of divergences (f-divergences) which do what I want, but a quick flick through the literature doesn't seem to indicate which is best when... do you know of any good literature on this?
    $endgroup$
    – omegaSQU4RED
    Sep 11 at 13:17













0















0











0







$begingroup$

One of the metric is Hellinger distance between two distributions which are characterised by means and standard deviations. The application can be found in the following article.



https://www.sciencedirect.com/science/article/pii/S1568494615005104






share|cite|improve this answer










$endgroup$



One of the metric is Hellinger distance between two distributions which are characterised by means and standard deviations. The application can be found in the following article.



https://www.sciencedirect.com/science/article/pii/S1568494615005104







share|cite|improve this answer













share|cite|improve this answer




share|cite|improve this answer










answered Sep 11 at 12:26









user9003011user9003011

1




1














  • $begingroup$
    Thanks for that. I have seen that there is a whole family of divergences (f-divergences) which do what I want, but a quick flick through the literature doesn't seem to indicate which is best when... do you know of any good literature on this?
    $endgroup$
    – omegaSQU4RED
    Sep 11 at 13:17
















  • $begingroup$
    Thanks for that. I have seen that there is a whole family of divergences (f-divergences) which do what I want, but a quick flick through the literature doesn't seem to indicate which is best when... do you know of any good literature on this?
    $endgroup$
    – omegaSQU4RED
    Sep 11 at 13:17















$begingroup$
Thanks for that. I have seen that there is a whole family of divergences (f-divergences) which do what I want, but a quick flick through the literature doesn't seem to indicate which is best when... do you know of any good literature on this?
$endgroup$
– omegaSQU4RED
Sep 11 at 13:17




$begingroup$
Thanks for that. I have seen that there is a whole family of divergences (f-divergences) which do what I want, but a quick flick through the literature doesn't seem to indicate which is best when... do you know of any good literature on this?
$endgroup$
– omegaSQU4RED
Sep 11 at 13:17


















draft saved

draft discarded















































Thanks for contributing an answer to Cross Validated!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f425040%2fhow-to-measure-the-statistical-distance-between-two-frequency-distributions%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown









Popular posts from this blog

Tamil (spriik) Luke uk diar | Nawigatjuun

Align equal signs while including text over equalitiesAMS align: left aligned text/math plus multicolumn alignmentMultiple alignmentsAligning equations in multiple placesNumbering and aligning an equation with multiple columnsHow to align one equation with another multline equationUsing \ in environments inside the begintabularxNumber equations and preserving alignment of equal signsHow can I align equations to the left and to the right?Double equation alignment problem within align enviromentAligned within align: Why are they right-aligned?

Training a classifier when some of the features are unknownWhy does Gradient Boosting regression predict negative values when there are no negative y-values in my training set?How to improve an existing (trained) classifier?What is effect when I set up some self defined predisctor variables?Why Matlab neural network classification returns decimal values on prediction dataset?Fitting and transforming text data in training, testing, and validation setsHow to quantify the performance of the classifier (multi-class SVM) using the test data?How do I control for some patients providing multiple samples in my training data?Training and Test setTraining a convolutional neural network for image denoising in MatlabShouldn't an autoencoder with #(neurons in hidden layer) = #(neurons in input layer) be “perfect”?