How to measure the statistical “distance” between two frequency distributions?How to measure usage data across cohorts/segments?Multivariate nonparametric divergence (or distance) between distributionsIntuition on the Kullback-Leibler (KL) DivergenceComparing two word distributionsJensen-Shannon Divergence for multiple probability distributions?Distance measure between discrete distributions (that contains 0) and uniformCalculate the Kullback-Leibler Divergence for these 2 Gamma distributions
Are we sinners because we sin or do we sin because we are sinners?
Using characters to delimit commands (like markdown)
Days in indexed month
Integration of three dimensional function gives wrong answer!
Why are there so many binary systems?
Is "montäglich" commonly used?
How can I improve combat so my players don't always use the strategy of collectively focusing fire on one enemy at a time until it's dead?
Unique magic triplets
What is :>filename.txt Doing?
Does toddler keep hands around private parts?
What does this docker log entry mean?
Why is the core ChaCha primitive not good for use in a CRCF? Why create BLAKE?
Is the genre 'fantasy' still fantasy without magic?
how can traditional forms of magic compete against demon magic?
Did Ohio pass a law granting students the right to give scientifically wrong answers consistent with their religious beliefs?
Stack data structure in python 3
Is it a mistake to use a password that has previously been used (by anyone ever)?
SSH host identification changes on one wireless network
When can't Cramer-Rao lower bound be reached?
Elevator design implementation in C++
UK visitors visa needed fast for badly injured family member
How do critical hits work with static monster damage?
Can the Protection fighting style be used in this way?
If צחק refers to the 3 worst misdeeds why Itzchak was called so?
How to measure the statistical “distance” between two frequency distributions?
How to measure usage data across cohorts/segments?Multivariate nonparametric divergence (or distance) between distributionsIntuition on the Kullback-Leibler (KL) DivergenceComparing two word distributionsJensen-Shannon Divergence for multiple probability distributions?Distance measure between discrete distributions (that contains 0) and uniformCalculate the Kullback-Leibler Divergence for these 2 Gamma distributions
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;
$begingroup$
I am undertaking a data analysis project which involves investigating website usage times over the course of the year. What I would like to do is compare how "consistent" the usage patterns are, say, how close they are to a pattern which involves using it for 1 hour once per week, or one which involves using it for 10 minutes a time, 6 times per week. I am aware of several things which can be calculated:
Shannon entropy: measures how much the "certainty" in the outcome differs, i.e. how much a probability distribution differs from one that is uniform;
Kullback-Liebler divergence: measures how much one probability distribution differs from another
Jensen-Shannon divergence: similar to the KL-divergence, but more useful as it returns finite values
Smirnov-Kolmogorov test: a test to determine whether two cumulative distribution functions for continuous random variables come from the same sample.
Chi-squared test: a goodness-of-fit test to decide how well a frequency distribution differs from an expected frequency distribution.
What I would like to do is compare how much the actual usage durations (blue) differ from ideal usage times (orange) in distribution. These distributions are discrete, and the versions below are normalised to become probability distributions. The horizontal axis represents the amount of time (in minutes) a user has spent on the website; this has been recorded for each day of the year; if the user has not gone on the website at all then this counts as a zero duration but these have been removed from the frequency distribution. On the right is the cumulative distribution function.
My only problem is, even though I can get the JS-divergence to return a finite value, when I look at different users and compare their usage distributions to the ideal one, I get values that are mostly identical (which is therefore not a good indicator of how much they differ). Also, quite a bit of information is lost when normalising to probability distributions rather than frequency distributions (say a student uses the platform 50 times, then the blue distribution should be vertically scaled so that the total of the lengths of the bars equals 50, and the orange bar should have a height of 50 rather than 1). Part of what we mean by "consistency" is whether how often a user goes on the website affects how much they get out of it; if the number of times they visit the website is lost then comparing probability distributions is a bit dubious; even if the probability distribution of a user's duration is close to the "ideal" usage, that user may only have used the platform for 1 week during the year, which arguably is not very consistent.
Are there any well-established techniques for comparing two frequency distributions and calculating some sort of metric which characterises how similar (or dissimilar) they are?
distributions distance frequency comparison
$endgroup$
add a comment
|
$begingroup$
I am undertaking a data analysis project which involves investigating website usage times over the course of the year. What I would like to do is compare how "consistent" the usage patterns are, say, how close they are to a pattern which involves using it for 1 hour once per week, or one which involves using it for 10 minutes a time, 6 times per week. I am aware of several things which can be calculated:
Shannon entropy: measures how much the "certainty" in the outcome differs, i.e. how much a probability distribution differs from one that is uniform;
Kullback-Liebler divergence: measures how much one probability distribution differs from another
Jensen-Shannon divergence: similar to the KL-divergence, but more useful as it returns finite values
Smirnov-Kolmogorov test: a test to determine whether two cumulative distribution functions for continuous random variables come from the same sample.
Chi-squared test: a goodness-of-fit test to decide how well a frequency distribution differs from an expected frequency distribution.
What I would like to do is compare how much the actual usage durations (blue) differ from ideal usage times (orange) in distribution. These distributions are discrete, and the versions below are normalised to become probability distributions. The horizontal axis represents the amount of time (in minutes) a user has spent on the website; this has been recorded for each day of the year; if the user has not gone on the website at all then this counts as a zero duration but these have been removed from the frequency distribution. On the right is the cumulative distribution function.
My only problem is, even though I can get the JS-divergence to return a finite value, when I look at different users and compare their usage distributions to the ideal one, I get values that are mostly identical (which is therefore not a good indicator of how much they differ). Also, quite a bit of information is lost when normalising to probability distributions rather than frequency distributions (say a student uses the platform 50 times, then the blue distribution should be vertically scaled so that the total of the lengths of the bars equals 50, and the orange bar should have a height of 50 rather than 1). Part of what we mean by "consistency" is whether how often a user goes on the website affects how much they get out of it; if the number of times they visit the website is lost then comparing probability distributions is a bit dubious; even if the probability distribution of a user's duration is close to the "ideal" usage, that user may only have used the platform for 1 week during the year, which arguably is not very consistent.
Are there any well-established techniques for comparing two frequency distributions and calculating some sort of metric which characterises how similar (or dissimilar) they are?
distributions distance frequency comparison
$endgroup$
4
$begingroup$
You might want to start by asking yourself what your loss function is (that is, in what way is usage pattern differing from the ideal bad, and how does the amount of badness depend on what sort of divergence there is), and designing your metric around that.
$endgroup$
– Acccumulation
Sep 5 at 17:37
add a comment
|
$begingroup$
I am undertaking a data analysis project which involves investigating website usage times over the course of the year. What I would like to do is compare how "consistent" the usage patterns are, say, how close they are to a pattern which involves using it for 1 hour once per week, or one which involves using it for 10 minutes a time, 6 times per week. I am aware of several things which can be calculated:
Shannon entropy: measures how much the "certainty" in the outcome differs, i.e. how much a probability distribution differs from one that is uniform;
Kullback-Liebler divergence: measures how much one probability distribution differs from another
Jensen-Shannon divergence: similar to the KL-divergence, but more useful as it returns finite values
Smirnov-Kolmogorov test: a test to determine whether two cumulative distribution functions for continuous random variables come from the same sample.
Chi-squared test: a goodness-of-fit test to decide how well a frequency distribution differs from an expected frequency distribution.
What I would like to do is compare how much the actual usage durations (blue) differ from ideal usage times (orange) in distribution. These distributions are discrete, and the versions below are normalised to become probability distributions. The horizontal axis represents the amount of time (in minutes) a user has spent on the website; this has been recorded for each day of the year; if the user has not gone on the website at all then this counts as a zero duration but these have been removed from the frequency distribution. On the right is the cumulative distribution function.
My only problem is, even though I can get the JS-divergence to return a finite value, when I look at different users and compare their usage distributions to the ideal one, I get values that are mostly identical (which is therefore not a good indicator of how much they differ). Also, quite a bit of information is lost when normalising to probability distributions rather than frequency distributions (say a student uses the platform 50 times, then the blue distribution should be vertically scaled so that the total of the lengths of the bars equals 50, and the orange bar should have a height of 50 rather than 1). Part of what we mean by "consistency" is whether how often a user goes on the website affects how much they get out of it; if the number of times they visit the website is lost then comparing probability distributions is a bit dubious; even if the probability distribution of a user's duration is close to the "ideal" usage, that user may only have used the platform for 1 week during the year, which arguably is not very consistent.
Are there any well-established techniques for comparing two frequency distributions and calculating some sort of metric which characterises how similar (or dissimilar) they are?
distributions distance frequency comparison
$endgroup$
I am undertaking a data analysis project which involves investigating website usage times over the course of the year. What I would like to do is compare how "consistent" the usage patterns are, say, how close they are to a pattern which involves using it for 1 hour once per week, or one which involves using it for 10 minutes a time, 6 times per week. I am aware of several things which can be calculated:
Shannon entropy: measures how much the "certainty" in the outcome differs, i.e. how much a probability distribution differs from one that is uniform;
Kullback-Liebler divergence: measures how much one probability distribution differs from another
Jensen-Shannon divergence: similar to the KL-divergence, but more useful as it returns finite values
Smirnov-Kolmogorov test: a test to determine whether two cumulative distribution functions for continuous random variables come from the same sample.
Chi-squared test: a goodness-of-fit test to decide how well a frequency distribution differs from an expected frequency distribution.
What I would like to do is compare how much the actual usage durations (blue) differ from ideal usage times (orange) in distribution. These distributions are discrete, and the versions below are normalised to become probability distributions. The horizontal axis represents the amount of time (in minutes) a user has spent on the website; this has been recorded for each day of the year; if the user has not gone on the website at all then this counts as a zero duration but these have been removed from the frequency distribution. On the right is the cumulative distribution function.
My only problem is, even though I can get the JS-divergence to return a finite value, when I look at different users and compare their usage distributions to the ideal one, I get values that are mostly identical (which is therefore not a good indicator of how much they differ). Also, quite a bit of information is lost when normalising to probability distributions rather than frequency distributions (say a student uses the platform 50 times, then the blue distribution should be vertically scaled so that the total of the lengths of the bars equals 50, and the orange bar should have a height of 50 rather than 1). Part of what we mean by "consistency" is whether how often a user goes on the website affects how much they get out of it; if the number of times they visit the website is lost then comparing probability distributions is a bit dubious; even if the probability distribution of a user's duration is close to the "ideal" usage, that user may only have used the platform for 1 week during the year, which arguably is not very consistent.
Are there any well-established techniques for comparing two frequency distributions and calculating some sort of metric which characterises how similar (or dissimilar) they are?
distributions distance frequency comparison
distributions distance frequency comparison
edited Sep 5 at 8:38
omegaSQU4RED
asked Sep 5 at 8:30
omegaSQU4REDomegaSQU4RED
2414 bronze badges
2414 bronze badges
4
$begingroup$
You might want to start by asking yourself what your loss function is (that is, in what way is usage pattern differing from the ideal bad, and how does the amount of badness depend on what sort of divergence there is), and designing your metric around that.
$endgroup$
– Acccumulation
Sep 5 at 17:37
add a comment
|
4
$begingroup$
You might want to start by asking yourself what your loss function is (that is, in what way is usage pattern differing from the ideal bad, and how does the amount of badness depend on what sort of divergence there is), and designing your metric around that.
$endgroup$
– Acccumulation
Sep 5 at 17:37
4
4
$begingroup$
You might want to start by asking yourself what your loss function is (that is, in what way is usage pattern differing from the ideal bad, and how does the amount of badness depend on what sort of divergence there is), and designing your metric around that.
$endgroup$
– Acccumulation
Sep 5 at 17:37
$begingroup$
You might want to start by asking yourself what your loss function is (that is, in what way is usage pattern differing from the ideal bad, and how does the amount of badness depend on what sort of divergence there is), and designing your metric around that.
$endgroup$
– Acccumulation
Sep 5 at 17:37
add a comment
|
3 Answers
3
active
oldest
votes
$begingroup$
You may be interested in the Earth mover's distance, also known as the Wasserstein metric. It is implemented in R (look at the emdist
package) and in Python. We also have a number of threads on it.
The EMD works for both continuous and discrete distributions. The emdist
package for R works on discrete distributions.
The advantage over something like a $chi^2$ statistic is that the EMD yields interpretable results. Picture your distribution as mounds of earth, then the EMD tells you how much earth you would need to transport how far to turn one distribution into the other.
Put another way: two distributions (1,0,0) and (0,1,0) should be "more similar" than (1,0,0) and (0,0,1). The EMD will recognize this and assign a smaller distance to the first pair than to the second. The $chi^2$ statistic will assign the same distance to both pairs, because it has no notion of an ordering in the distribution entries.
$endgroup$
$begingroup$
Why that particular distance? It seems to be designed for any continuous distribution. OP has a frequency distribution, so why not a more "discrete" distance such as Chi-square?
$endgroup$
– user2974951
Sep 5 at 8:47
$begingroup$
@user2974951: fair enough. See my edit.
$endgroup$
– S. Kolassa - Reinstate Monica
Sep 5 at 8:56
$begingroup$
if distributions are 1D, as suggested in the question, then running and EMD solver is overkill. What needs to be done in that case is just compute a $L_p$ metric between the empirical quantile functions of both densities (essentially the inverses of the CDFs plotted in the quesion). p.31 in arxiv.org/abs/1803.00567 or if you need a more detailed account chapter 2 of math.u-psud.fr/~filippo/OTAM-cvgmt.pdf
$endgroup$
– Marco Cuturi
Sep 10 at 20:42
$begingroup$
@MarcoCuturi: an $L^p$ distance is certainly another possibility. However, it will again assign the same distance between (1,0,0) and (0,1,0) as between (1,0,0) and (0,0,1), which is a bit unintuitive. Whether the added intuitive niceness of the EMD is worth the additional complexity is something the OP can consider.
$endgroup$
– S. Kolassa - Reinstate Monica
Sep 11 at 8:57
add a comment
|
$begingroup$
If you randomly sample an individual from each of the two distributions, you can calculate a difference between them. If you repeat this (with replacement) a number of times, you can generate a distribution of differences that contains all the information you are after. You can plot this distribution and characterize it with any summary statistics you desire - means, medians, etc.
$endgroup$
1
$begingroup$
Is there a name for such a procedure?
$endgroup$
– user2974951
Sep 5 at 8:55
4
$begingroup$
I wonder how one could account for the baseline fact that the distribution of differences for an arbitrary distribution and itself will be different for different arbitrary distributions; think U(0,1) vs. itself compared to N(0,1) vs. itself. Hence, the distribution of differences you would get comparing two different distributions would be hard to evaluate in absence of a unique baseline. The problem disappears if the observations are paired, then the baseline would be a unit mass at zero.
$endgroup$
– Richard Hardy
Sep 5 at 10:44
$begingroup$
@user2974951 I'm sure there is, as it's pretty simple and clearly related to bootstrapping. But I don't know what to call it precisely.
$endgroup$
– mkt - Reinstate Monica
Sep 5 at 11:02
1
$begingroup$
@mkt, thanks for your clarification. Without intending to argue just for the sake it, I still think that without a unique baseline we do not quite have a ruler. But I will leave it at that. There is something nice about your idea anyway.
$endgroup$
– Richard Hardy
Sep 5 at 15:18
1
$begingroup$
@RichardHardy I appreciate the exchange here, and you might well be right. I'll have to think about this further.
$endgroup$
– mkt - Reinstate Monica
Sep 5 at 15:42
|
show 6 more comments
$begingroup$
One of the metric is Hellinger distance between two distributions which are characterised by means and standard deviations. The application can be found in the following article.
https://www.sciencedirect.com/science/article/pii/S1568494615005104
$endgroup$
$begingroup$
Thanks for that. I have seen that there is a whole family of divergences (f-divergences) which do what I want, but a quick flick through the literature doesn't seem to indicate which is best when... do you know of any good literature on this?
$endgroup$
– omegaSQU4RED
Sep 11 at 13:17
add a comment
|
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f425040%2fhow-to-measure-the-statistical-distance-between-two-frequency-distributions%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
You may be interested in the Earth mover's distance, also known as the Wasserstein metric. It is implemented in R (look at the emdist
package) and in Python. We also have a number of threads on it.
The EMD works for both continuous and discrete distributions. The emdist
package for R works on discrete distributions.
The advantage over something like a $chi^2$ statistic is that the EMD yields interpretable results. Picture your distribution as mounds of earth, then the EMD tells you how much earth you would need to transport how far to turn one distribution into the other.
Put another way: two distributions (1,0,0) and (0,1,0) should be "more similar" than (1,0,0) and (0,0,1). The EMD will recognize this and assign a smaller distance to the first pair than to the second. The $chi^2$ statistic will assign the same distance to both pairs, because it has no notion of an ordering in the distribution entries.
$endgroup$
$begingroup$
Why that particular distance? It seems to be designed for any continuous distribution. OP has a frequency distribution, so why not a more "discrete" distance such as Chi-square?
$endgroup$
– user2974951
Sep 5 at 8:47
$begingroup$
@user2974951: fair enough. See my edit.
$endgroup$
– S. Kolassa - Reinstate Monica
Sep 5 at 8:56
$begingroup$
if distributions are 1D, as suggested in the question, then running and EMD solver is overkill. What needs to be done in that case is just compute a $L_p$ metric between the empirical quantile functions of both densities (essentially the inverses of the CDFs plotted in the quesion). p.31 in arxiv.org/abs/1803.00567 or if you need a more detailed account chapter 2 of math.u-psud.fr/~filippo/OTAM-cvgmt.pdf
$endgroup$
– Marco Cuturi
Sep 10 at 20:42
$begingroup$
@MarcoCuturi: an $L^p$ distance is certainly another possibility. However, it will again assign the same distance between (1,0,0) and (0,1,0) as between (1,0,0) and (0,0,1), which is a bit unintuitive. Whether the added intuitive niceness of the EMD is worth the additional complexity is something the OP can consider.
$endgroup$
– S. Kolassa - Reinstate Monica
Sep 11 at 8:57
add a comment
|
$begingroup$
You may be interested in the Earth mover's distance, also known as the Wasserstein metric. It is implemented in R (look at the emdist
package) and in Python. We also have a number of threads on it.
The EMD works for both continuous and discrete distributions. The emdist
package for R works on discrete distributions.
The advantage over something like a $chi^2$ statistic is that the EMD yields interpretable results. Picture your distribution as mounds of earth, then the EMD tells you how much earth you would need to transport how far to turn one distribution into the other.
Put another way: two distributions (1,0,0) and (0,1,0) should be "more similar" than (1,0,0) and (0,0,1). The EMD will recognize this and assign a smaller distance to the first pair than to the second. The $chi^2$ statistic will assign the same distance to both pairs, because it has no notion of an ordering in the distribution entries.
$endgroup$
$begingroup$
Why that particular distance? It seems to be designed for any continuous distribution. OP has a frequency distribution, so why not a more "discrete" distance such as Chi-square?
$endgroup$
– user2974951
Sep 5 at 8:47
$begingroup$
@user2974951: fair enough. See my edit.
$endgroup$
– S. Kolassa - Reinstate Monica
Sep 5 at 8:56
$begingroup$
if distributions are 1D, as suggested in the question, then running and EMD solver is overkill. What needs to be done in that case is just compute a $L_p$ metric between the empirical quantile functions of both densities (essentially the inverses of the CDFs plotted in the quesion). p.31 in arxiv.org/abs/1803.00567 or if you need a more detailed account chapter 2 of math.u-psud.fr/~filippo/OTAM-cvgmt.pdf
$endgroup$
– Marco Cuturi
Sep 10 at 20:42
$begingroup$
@MarcoCuturi: an $L^p$ distance is certainly another possibility. However, it will again assign the same distance between (1,0,0) and (0,1,0) as between (1,0,0) and (0,0,1), which is a bit unintuitive. Whether the added intuitive niceness of the EMD is worth the additional complexity is something the OP can consider.
$endgroup$
– S. Kolassa - Reinstate Monica
Sep 11 at 8:57
add a comment
|
$begingroup$
You may be interested in the Earth mover's distance, also known as the Wasserstein metric. It is implemented in R (look at the emdist
package) and in Python. We also have a number of threads on it.
The EMD works for both continuous and discrete distributions. The emdist
package for R works on discrete distributions.
The advantage over something like a $chi^2$ statistic is that the EMD yields interpretable results. Picture your distribution as mounds of earth, then the EMD tells you how much earth you would need to transport how far to turn one distribution into the other.
Put another way: two distributions (1,0,0) and (0,1,0) should be "more similar" than (1,0,0) and (0,0,1). The EMD will recognize this and assign a smaller distance to the first pair than to the second. The $chi^2$ statistic will assign the same distance to both pairs, because it has no notion of an ordering in the distribution entries.
$endgroup$
You may be interested in the Earth mover's distance, also known as the Wasserstein metric. It is implemented in R (look at the emdist
package) and in Python. We also have a number of threads on it.
The EMD works for both continuous and discrete distributions. The emdist
package for R works on discrete distributions.
The advantage over something like a $chi^2$ statistic is that the EMD yields interpretable results. Picture your distribution as mounds of earth, then the EMD tells you how much earth you would need to transport how far to turn one distribution into the other.
Put another way: two distributions (1,0,0) and (0,1,0) should be "more similar" than (1,0,0) and (0,0,1). The EMD will recognize this and assign a smaller distance to the first pair than to the second. The $chi^2$ statistic will assign the same distance to both pairs, because it has no notion of an ordering in the distribution entries.
edited Sep 5 at 8:56
answered Sep 5 at 8:44
S. Kolassa - Reinstate MonicaS. Kolassa - Reinstate Monica
58.9k10 gold badges113 silver badges216 bronze badges
58.9k10 gold badges113 silver badges216 bronze badges
$begingroup$
Why that particular distance? It seems to be designed for any continuous distribution. OP has a frequency distribution, so why not a more "discrete" distance such as Chi-square?
$endgroup$
– user2974951
Sep 5 at 8:47
$begingroup$
@user2974951: fair enough. See my edit.
$endgroup$
– S. Kolassa - Reinstate Monica
Sep 5 at 8:56
$begingroup$
if distributions are 1D, as suggested in the question, then running and EMD solver is overkill. What needs to be done in that case is just compute a $L_p$ metric between the empirical quantile functions of both densities (essentially the inverses of the CDFs plotted in the quesion). p.31 in arxiv.org/abs/1803.00567 or if you need a more detailed account chapter 2 of math.u-psud.fr/~filippo/OTAM-cvgmt.pdf
$endgroup$
– Marco Cuturi
Sep 10 at 20:42
$begingroup$
@MarcoCuturi: an $L^p$ distance is certainly another possibility. However, it will again assign the same distance between (1,0,0) and (0,1,0) as between (1,0,0) and (0,0,1), which is a bit unintuitive. Whether the added intuitive niceness of the EMD is worth the additional complexity is something the OP can consider.
$endgroup$
– S. Kolassa - Reinstate Monica
Sep 11 at 8:57
add a comment
|
$begingroup$
Why that particular distance? It seems to be designed for any continuous distribution. OP has a frequency distribution, so why not a more "discrete" distance such as Chi-square?
$endgroup$
– user2974951
Sep 5 at 8:47
$begingroup$
@user2974951: fair enough. See my edit.
$endgroup$
– S. Kolassa - Reinstate Monica
Sep 5 at 8:56
$begingroup$
if distributions are 1D, as suggested in the question, then running and EMD solver is overkill. What needs to be done in that case is just compute a $L_p$ metric between the empirical quantile functions of both densities (essentially the inverses of the CDFs plotted in the quesion). p.31 in arxiv.org/abs/1803.00567 or if you need a more detailed account chapter 2 of math.u-psud.fr/~filippo/OTAM-cvgmt.pdf
$endgroup$
– Marco Cuturi
Sep 10 at 20:42
$begingroup$
@MarcoCuturi: an $L^p$ distance is certainly another possibility. However, it will again assign the same distance between (1,0,0) and (0,1,0) as between (1,0,0) and (0,0,1), which is a bit unintuitive. Whether the added intuitive niceness of the EMD is worth the additional complexity is something the OP can consider.
$endgroup$
– S. Kolassa - Reinstate Monica
Sep 11 at 8:57
$begingroup$
Why that particular distance? It seems to be designed for any continuous distribution. OP has a frequency distribution, so why not a more "discrete" distance such as Chi-square?
$endgroup$
– user2974951
Sep 5 at 8:47
$begingroup$
Why that particular distance? It seems to be designed for any continuous distribution. OP has a frequency distribution, so why not a more "discrete" distance such as Chi-square?
$endgroup$
– user2974951
Sep 5 at 8:47
$begingroup$
@user2974951: fair enough. See my edit.
$endgroup$
– S. Kolassa - Reinstate Monica
Sep 5 at 8:56
$begingroup$
@user2974951: fair enough. See my edit.
$endgroup$
– S. Kolassa - Reinstate Monica
Sep 5 at 8:56
$begingroup$
if distributions are 1D, as suggested in the question, then running and EMD solver is overkill. What needs to be done in that case is just compute a $L_p$ metric between the empirical quantile functions of both densities (essentially the inverses of the CDFs plotted in the quesion). p.31 in arxiv.org/abs/1803.00567 or if you need a more detailed account chapter 2 of math.u-psud.fr/~filippo/OTAM-cvgmt.pdf
$endgroup$
– Marco Cuturi
Sep 10 at 20:42
$begingroup$
if distributions are 1D, as suggested in the question, then running and EMD solver is overkill. What needs to be done in that case is just compute a $L_p$ metric between the empirical quantile functions of both densities (essentially the inverses of the CDFs plotted in the quesion). p.31 in arxiv.org/abs/1803.00567 or if you need a more detailed account chapter 2 of math.u-psud.fr/~filippo/OTAM-cvgmt.pdf
$endgroup$
– Marco Cuturi
Sep 10 at 20:42
$begingroup$
@MarcoCuturi: an $L^p$ distance is certainly another possibility. However, it will again assign the same distance between (1,0,0) and (0,1,0) as between (1,0,0) and (0,0,1), which is a bit unintuitive. Whether the added intuitive niceness of the EMD is worth the additional complexity is something the OP can consider.
$endgroup$
– S. Kolassa - Reinstate Monica
Sep 11 at 8:57
$begingroup$
@MarcoCuturi: an $L^p$ distance is certainly another possibility. However, it will again assign the same distance between (1,0,0) and (0,1,0) as between (1,0,0) and (0,0,1), which is a bit unintuitive. Whether the added intuitive niceness of the EMD is worth the additional complexity is something the OP can consider.
$endgroup$
– S. Kolassa - Reinstate Monica
Sep 11 at 8:57
add a comment
|
$begingroup$
If you randomly sample an individual from each of the two distributions, you can calculate a difference between them. If you repeat this (with replacement) a number of times, you can generate a distribution of differences that contains all the information you are after. You can plot this distribution and characterize it with any summary statistics you desire - means, medians, etc.
$endgroup$
1
$begingroup$
Is there a name for such a procedure?
$endgroup$
– user2974951
Sep 5 at 8:55
4
$begingroup$
I wonder how one could account for the baseline fact that the distribution of differences for an arbitrary distribution and itself will be different for different arbitrary distributions; think U(0,1) vs. itself compared to N(0,1) vs. itself. Hence, the distribution of differences you would get comparing two different distributions would be hard to evaluate in absence of a unique baseline. The problem disappears if the observations are paired, then the baseline would be a unit mass at zero.
$endgroup$
– Richard Hardy
Sep 5 at 10:44
$begingroup$
@user2974951 I'm sure there is, as it's pretty simple and clearly related to bootstrapping. But I don't know what to call it precisely.
$endgroup$
– mkt - Reinstate Monica
Sep 5 at 11:02
1
$begingroup$
@mkt, thanks for your clarification. Without intending to argue just for the sake it, I still think that without a unique baseline we do not quite have a ruler. But I will leave it at that. There is something nice about your idea anyway.
$endgroup$
– Richard Hardy
Sep 5 at 15:18
1
$begingroup$
@RichardHardy I appreciate the exchange here, and you might well be right. I'll have to think about this further.
$endgroup$
– mkt - Reinstate Monica
Sep 5 at 15:42
|
show 6 more comments
$begingroup$
If you randomly sample an individual from each of the two distributions, you can calculate a difference between them. If you repeat this (with replacement) a number of times, you can generate a distribution of differences that contains all the information you are after. You can plot this distribution and characterize it with any summary statistics you desire - means, medians, etc.
$endgroup$
1
$begingroup$
Is there a name for such a procedure?
$endgroup$
– user2974951
Sep 5 at 8:55
4
$begingroup$
I wonder how one could account for the baseline fact that the distribution of differences for an arbitrary distribution and itself will be different for different arbitrary distributions; think U(0,1) vs. itself compared to N(0,1) vs. itself. Hence, the distribution of differences you would get comparing two different distributions would be hard to evaluate in absence of a unique baseline. The problem disappears if the observations are paired, then the baseline would be a unit mass at zero.
$endgroup$
– Richard Hardy
Sep 5 at 10:44
$begingroup$
@user2974951 I'm sure there is, as it's pretty simple and clearly related to bootstrapping. But I don't know what to call it precisely.
$endgroup$
– mkt - Reinstate Monica
Sep 5 at 11:02
1
$begingroup$
@mkt, thanks for your clarification. Without intending to argue just for the sake it, I still think that without a unique baseline we do not quite have a ruler. But I will leave it at that. There is something nice about your idea anyway.
$endgroup$
– Richard Hardy
Sep 5 at 15:18
1
$begingroup$
@RichardHardy I appreciate the exchange here, and you might well be right. I'll have to think about this further.
$endgroup$
– mkt - Reinstate Monica
Sep 5 at 15:42
|
show 6 more comments
$begingroup$
If you randomly sample an individual from each of the two distributions, you can calculate a difference between them. If you repeat this (with replacement) a number of times, you can generate a distribution of differences that contains all the information you are after. You can plot this distribution and characterize it with any summary statistics you desire - means, medians, etc.
$endgroup$
If you randomly sample an individual from each of the two distributions, you can calculate a difference between them. If you repeat this (with replacement) a number of times, you can generate a distribution of differences that contains all the information you are after. You can plot this distribution and characterize it with any summary statistics you desire - means, medians, etc.
answered Sep 5 at 8:47
mkt - Reinstate Monicamkt - Reinstate Monica
9,6866 gold badges33 silver badges96 bronze badges
9,6866 gold badges33 silver badges96 bronze badges
1
$begingroup$
Is there a name for such a procedure?
$endgroup$
– user2974951
Sep 5 at 8:55
4
$begingroup$
I wonder how one could account for the baseline fact that the distribution of differences for an arbitrary distribution and itself will be different for different arbitrary distributions; think U(0,1) vs. itself compared to N(0,1) vs. itself. Hence, the distribution of differences you would get comparing two different distributions would be hard to evaluate in absence of a unique baseline. The problem disappears if the observations are paired, then the baseline would be a unit mass at zero.
$endgroup$
– Richard Hardy
Sep 5 at 10:44
$begingroup$
@user2974951 I'm sure there is, as it's pretty simple and clearly related to bootstrapping. But I don't know what to call it precisely.
$endgroup$
– mkt - Reinstate Monica
Sep 5 at 11:02
1
$begingroup$
@mkt, thanks for your clarification. Without intending to argue just for the sake it, I still think that without a unique baseline we do not quite have a ruler. But I will leave it at that. There is something nice about your idea anyway.
$endgroup$
– Richard Hardy
Sep 5 at 15:18
1
$begingroup$
@RichardHardy I appreciate the exchange here, and you might well be right. I'll have to think about this further.
$endgroup$
– mkt - Reinstate Monica
Sep 5 at 15:42
|
show 6 more comments
1
$begingroup$
Is there a name for such a procedure?
$endgroup$
– user2974951
Sep 5 at 8:55
4
$begingroup$
I wonder how one could account for the baseline fact that the distribution of differences for an arbitrary distribution and itself will be different for different arbitrary distributions; think U(0,1) vs. itself compared to N(0,1) vs. itself. Hence, the distribution of differences you would get comparing two different distributions would be hard to evaluate in absence of a unique baseline. The problem disappears if the observations are paired, then the baseline would be a unit mass at zero.
$endgroup$
– Richard Hardy
Sep 5 at 10:44
$begingroup$
@user2974951 I'm sure there is, as it's pretty simple and clearly related to bootstrapping. But I don't know what to call it precisely.
$endgroup$
– mkt - Reinstate Monica
Sep 5 at 11:02
1
$begingroup$
@mkt, thanks for your clarification. Without intending to argue just for the sake it, I still think that without a unique baseline we do not quite have a ruler. But I will leave it at that. There is something nice about your idea anyway.
$endgroup$
– Richard Hardy
Sep 5 at 15:18
1
$begingroup$
@RichardHardy I appreciate the exchange here, and you might well be right. I'll have to think about this further.
$endgroup$
– mkt - Reinstate Monica
Sep 5 at 15:42
1
1
$begingroup$
Is there a name for such a procedure?
$endgroup$
– user2974951
Sep 5 at 8:55
$begingroup$
Is there a name for such a procedure?
$endgroup$
– user2974951
Sep 5 at 8:55
4
4
$begingroup$
I wonder how one could account for the baseline fact that the distribution of differences for an arbitrary distribution and itself will be different for different arbitrary distributions; think U(0,1) vs. itself compared to N(0,1) vs. itself. Hence, the distribution of differences you would get comparing two different distributions would be hard to evaluate in absence of a unique baseline. The problem disappears if the observations are paired, then the baseline would be a unit mass at zero.
$endgroup$
– Richard Hardy
Sep 5 at 10:44
$begingroup$
I wonder how one could account for the baseline fact that the distribution of differences for an arbitrary distribution and itself will be different for different arbitrary distributions; think U(0,1) vs. itself compared to N(0,1) vs. itself. Hence, the distribution of differences you would get comparing two different distributions would be hard to evaluate in absence of a unique baseline. The problem disappears if the observations are paired, then the baseline would be a unit mass at zero.
$endgroup$
– Richard Hardy
Sep 5 at 10:44
$begingroup$
@user2974951 I'm sure there is, as it's pretty simple and clearly related to bootstrapping. But I don't know what to call it precisely.
$endgroup$
– mkt - Reinstate Monica
Sep 5 at 11:02
$begingroup$
@user2974951 I'm sure there is, as it's pretty simple and clearly related to bootstrapping. But I don't know what to call it precisely.
$endgroup$
– mkt - Reinstate Monica
Sep 5 at 11:02
1
1
$begingroup$
@mkt, thanks for your clarification. Without intending to argue just for the sake it, I still think that without a unique baseline we do not quite have a ruler. But I will leave it at that. There is something nice about your idea anyway.
$endgroup$
– Richard Hardy
Sep 5 at 15:18
$begingroup$
@mkt, thanks for your clarification. Without intending to argue just for the sake it, I still think that without a unique baseline we do not quite have a ruler. But I will leave it at that. There is something nice about your idea anyway.
$endgroup$
– Richard Hardy
Sep 5 at 15:18
1
1
$begingroup$
@RichardHardy I appreciate the exchange here, and you might well be right. I'll have to think about this further.
$endgroup$
– mkt - Reinstate Monica
Sep 5 at 15:42
$begingroup$
@RichardHardy I appreciate the exchange here, and you might well be right. I'll have to think about this further.
$endgroup$
– mkt - Reinstate Monica
Sep 5 at 15:42
|
show 6 more comments
$begingroup$
One of the metric is Hellinger distance between two distributions which are characterised by means and standard deviations. The application can be found in the following article.
https://www.sciencedirect.com/science/article/pii/S1568494615005104
$endgroup$
$begingroup$
Thanks for that. I have seen that there is a whole family of divergences (f-divergences) which do what I want, but a quick flick through the literature doesn't seem to indicate which is best when... do you know of any good literature on this?
$endgroup$
– omegaSQU4RED
Sep 11 at 13:17
add a comment
|
$begingroup$
One of the metric is Hellinger distance between two distributions which are characterised by means and standard deviations. The application can be found in the following article.
https://www.sciencedirect.com/science/article/pii/S1568494615005104
$endgroup$
$begingroup$
Thanks for that. I have seen that there is a whole family of divergences (f-divergences) which do what I want, but a quick flick through the literature doesn't seem to indicate which is best when... do you know of any good literature on this?
$endgroup$
– omegaSQU4RED
Sep 11 at 13:17
add a comment
|
$begingroup$
One of the metric is Hellinger distance between two distributions which are characterised by means and standard deviations. The application can be found in the following article.
https://www.sciencedirect.com/science/article/pii/S1568494615005104
$endgroup$
One of the metric is Hellinger distance between two distributions which are characterised by means and standard deviations. The application can be found in the following article.
https://www.sciencedirect.com/science/article/pii/S1568494615005104
answered Sep 11 at 12:26
user9003011user9003011
1
1
$begingroup$
Thanks for that. I have seen that there is a whole family of divergences (f-divergences) which do what I want, but a quick flick through the literature doesn't seem to indicate which is best when... do you know of any good literature on this?
$endgroup$
– omegaSQU4RED
Sep 11 at 13:17
add a comment
|
$begingroup$
Thanks for that. I have seen that there is a whole family of divergences (f-divergences) which do what I want, but a quick flick through the literature doesn't seem to indicate which is best when... do you know of any good literature on this?
$endgroup$
– omegaSQU4RED
Sep 11 at 13:17
$begingroup$
Thanks for that. I have seen that there is a whole family of divergences (f-divergences) which do what I want, but a quick flick through the literature doesn't seem to indicate which is best when... do you know of any good literature on this?
$endgroup$
– omegaSQU4RED
Sep 11 at 13:17
$begingroup$
Thanks for that. I have seen that there is a whole family of divergences (f-divergences) which do what I want, but a quick flick through the literature doesn't seem to indicate which is best when... do you know of any good literature on this?
$endgroup$
– omegaSQU4RED
Sep 11 at 13:17
add a comment
|
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f425040%2fhow-to-measure-the-statistical-distance-between-two-frequency-distributions%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
4
$begingroup$
You might want to start by asking yourself what your loss function is (that is, in what way is usage pattern differing from the ideal bad, and how does the amount of badness depend on what sort of divergence there is), and designing your metric around that.
$endgroup$
– Acccumulation
Sep 5 at 17:37