Counting most common combination of values in dataframe column“Least Astonishment” and the Mutable Default ArgumentAdding new column to existing DataFrame in Python pandas“Large data” work flows using pandasChange data type of columns in PandasHow to iterate over rows in a DataFrame in Pandas?How to select rows from a DataFrame based on column values?Combine two columns of text in dataframe in pandas/pythonGet list from pandas DataFrame column headersHow to count the NaN values in a column in pandas DataFrameWhy is “1000000000000000 in range(1000000000000001)” so fast in Python 3?
Is it normal to not be able to work 8 hours a day?
What stops one country from issuing another country's passports?
What is the difference between "more" and "less" commands?
How much would we learn from observing an FTL starship fly by?
What are good practices for improving quality in a project new to me?
Creating an affinity-matrix between protein and RNA sequences
Router won't hold configuration
Is there a guide/reference for possible character hairstyles in D&D Forgotten Realms universe?
Why should interrupts be short in a well configured system?
Adding "dot com" to the end of a sentence?
Arrow (->) operator precedence/priority is lowest, or priority of assignment/combined assignment is lowest?
Which Grows Faster: Factorial or Double Exponentiation
Does this code demonstrate the central limit theorem?
Brainfuck interpreter written in C
Aliens kill as an art form, surprised that humans don't appreciate
why "to sleep or lie on your back" but not "to sleep or lie with your back on the floor"?
Is it true that almost everyone who starts a PhD and sticks around long enough can get one?
Is the worst version of the accusations against President Trump impeachable?
How do I recover from a cryptocurrency scam?
What's the purpose of using a union with only one member?
Why does the forward voltage drop in a diode vary slightly when there is a change in the diode current?
Chess PhD topic in machine learning?
How do you help a new player evaluate complex multiclassing options without driving them and yourself crazy?
Why the highlighted outline in animated cartoons?
Counting most common combination of values in dataframe column
“Least Astonishment” and the Mutable Default ArgumentAdding new column to existing DataFrame in Python pandas“Large data” work flows using pandasChange data type of columns in PandasHow to iterate over rows in a DataFrame in Pandas?How to select rows from a DataFrame based on column values?Combine two columns of text in dataframe in pandas/pythonGet list from pandas DataFrame column headersHow to count the NaN values in a column in pandas DataFrameWhy is “1000000000000000 in range(1000000000000001)” so fast in Python 3?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty
margin-bottom:0;
I have DataFrame in the following form:
ID Product
1 A
1 B
2 A
3 A
3 C
3 D
4 A
4 B
I would like to count the most common combination of two values from Product column grouped by ID.
So for this example expected result would be:
Combination Count
A-B 2
A-C 1
A-D 1
C-D 1
Is this output possible with pandas?
python pandas
add a comment
|
I have DataFrame in the following form:
ID Product
1 A
1 B
2 A
3 A
3 C
3 D
4 A
4 B
I would like to count the most common combination of two values from Product column grouped by ID.
So for this example expected result would be:
Combination Count
A-B 2
A-C 1
A-D 1
C-D 1
Is this output possible with pandas?
python pandas
add a comment
|
I have DataFrame in the following form:
ID Product
1 A
1 B
2 A
3 A
3 C
3 D
4 A
4 B
I would like to count the most common combination of two values from Product column grouped by ID.
So for this example expected result would be:
Combination Count
A-B 2
A-C 1
A-D 1
C-D 1
Is this output possible with pandas?
python pandas
I have DataFrame in the following form:
ID Product
1 A
1 B
2 A
3 A
3 C
3 D
4 A
4 B
I would like to count the most common combination of two values from Product column grouped by ID.
So for this example expected result would be:
Combination Count
A-B 2
A-C 1
A-D 1
C-D 1
Is this output possible with pandas?
python pandas
python pandas
asked Sep 19 at 19:46
Alex TAlex T
1,5711 gold badge12 silver badges33 bronze badges
1,5711 gold badge12 silver badges33 bronze badges
add a comment
|
add a comment
|
5 Answers
5
active
oldest
votes
Use itertools.combinations, explode and value_counts
import itertools
(df.groupby('ID').Product.agg(lambda x: list(itertools.combinations(x,2)))
.explode().str.join('-').value_counts())
Out[611]:
A-B 2
C-D 1
A-D 1
A-C 1
Name: Product, dtype: int64
Or:
import itertools
(df.groupby('ID').Product.agg(lambda x: list(map('-'.join, itertools.combinations(x,2))))
.explode().value_counts())
Out[597]:
A-B 2
C-D 1
A-D 1
A-C 1
Name: Product, dtype: int64
add a comment
|
We can merge within ID and filter out duplicate merges (I assume you have a default RangeIndex). Then we sort so that the grouping is regardless of order:
import pandas as pd
import numpy as np
df1 = df.reset_index()
df1 = df1.merge(df1, on='ID').query('index_x > index_y')
df1 = pd.DataFrame(np.sort(df1[['Product_x', 'Product_y']].to_numpy(), axis=1))
df1.groupby([*df1]).size()
0 1
A B 2
C 1
D 1
C D 1
dtype: int64
add a comment
|
You can use combinations from itertools along with groupby and apply
from itertools import combinations
def get_combs(x):
return pd.DataFrame('Combination': list(combinations(x.Product.values, 2)))
(df.groupby('ID').apply(get_combs)
.reset_index(level=0)
.groupby('Combination')
.count()
)
ID
Combination
(A, B) 2
(A, C) 1
(A, D) 1
(C, D) 1
add a comment
|
Using itertools and Counter.
import itertools
from collections import Counter
agg_ = lambda x: tuple(itertools.combinations(x, 2))
product = list(itertools.chain(*df.groupby('ID').agg('Product': lambda x: agg_(sorted(x))).Product))
# You actually do not need to wrap product with list. The generator is ok
counts = Counter(product)
Output
Counter(('A', 'B'): 2, ('A', 'C'): 1, ('A', 'D'): 1, ('C', 'D'): 1)
You could also do the following to get a dataframe
pd.DataFrame(list(counts.items()), columns=['combination', 'count'])
combination count
0 (A, B) 2
1 (A, C) 1
2 (A, D) 1
3 (C, D) 1
add a comment
|
Another trick with itertools.combinations function:
from itertools import combinations
import pandas as pd
test_df = ... # your df
counts_df = test_df.groupby('ID')['Product'].agg(lambda x: list(combinations(x, 2)))
.apply(pd.Series).stack().value_counts().to_frame()
.reset_index().rename(columns='index': 'Combination', 0:'Count')
print(counts_df)
The output:
Combination Count
0 (A, B) 2
1 (A, C) 1
2 (A, D) 1
3 (C, D) 1
add a comment
|
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f58018049%2fcounting-most-common-combination-of-values-in-dataframe-column%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
Use itertools.combinations, explode and value_counts
import itertools
(df.groupby('ID').Product.agg(lambda x: list(itertools.combinations(x,2)))
.explode().str.join('-').value_counts())
Out[611]:
A-B 2
C-D 1
A-D 1
A-C 1
Name: Product, dtype: int64
Or:
import itertools
(df.groupby('ID').Product.agg(lambda x: list(map('-'.join, itertools.combinations(x,2))))
.explode().value_counts())
Out[597]:
A-B 2
C-D 1
A-D 1
A-C 1
Name: Product, dtype: int64
add a comment
|
Use itertools.combinations, explode and value_counts
import itertools
(df.groupby('ID').Product.agg(lambda x: list(itertools.combinations(x,2)))
.explode().str.join('-').value_counts())
Out[611]:
A-B 2
C-D 1
A-D 1
A-C 1
Name: Product, dtype: int64
Or:
import itertools
(df.groupby('ID').Product.agg(lambda x: list(map('-'.join, itertools.combinations(x,2))))
.explode().value_counts())
Out[597]:
A-B 2
C-D 1
A-D 1
A-C 1
Name: Product, dtype: int64
add a comment
|
Use itertools.combinations, explode and value_counts
import itertools
(df.groupby('ID').Product.agg(lambda x: list(itertools.combinations(x,2)))
.explode().str.join('-').value_counts())
Out[611]:
A-B 2
C-D 1
A-D 1
A-C 1
Name: Product, dtype: int64
Or:
import itertools
(df.groupby('ID').Product.agg(lambda x: list(map('-'.join, itertools.combinations(x,2))))
.explode().value_counts())
Out[597]:
A-B 2
C-D 1
A-D 1
A-C 1
Name: Product, dtype: int64
Use itertools.combinations, explode and value_counts
import itertools
(df.groupby('ID').Product.agg(lambda x: list(itertools.combinations(x,2)))
.explode().str.join('-').value_counts())
Out[611]:
A-B 2
C-D 1
A-D 1
A-C 1
Name: Product, dtype: int64
Or:
import itertools
(df.groupby('ID').Product.agg(lambda x: list(map('-'.join, itertools.combinations(x,2))))
.explode().value_counts())
Out[597]:
A-B 2
C-D 1
A-D 1
A-C 1
Name: Product, dtype: int64
edited Sep 19 at 20:29
answered Sep 19 at 20:13
Andy L.Andy L.
10.3k1 gold badge6 silver badges17 bronze badges
10.3k1 gold badge6 silver badges17 bronze badges
add a comment
|
add a comment
|
We can merge within ID and filter out duplicate merges (I assume you have a default RangeIndex). Then we sort so that the grouping is regardless of order:
import pandas as pd
import numpy as np
df1 = df.reset_index()
df1 = df1.merge(df1, on='ID').query('index_x > index_y')
df1 = pd.DataFrame(np.sort(df1[['Product_x', 'Product_y']].to_numpy(), axis=1))
df1.groupby([*df1]).size()
0 1
A B 2
C 1
D 1
C D 1
dtype: int64
add a comment
|
We can merge within ID and filter out duplicate merges (I assume you have a default RangeIndex). Then we sort so that the grouping is regardless of order:
import pandas as pd
import numpy as np
df1 = df.reset_index()
df1 = df1.merge(df1, on='ID').query('index_x > index_y')
df1 = pd.DataFrame(np.sort(df1[['Product_x', 'Product_y']].to_numpy(), axis=1))
df1.groupby([*df1]).size()
0 1
A B 2
C 1
D 1
C D 1
dtype: int64
add a comment
|
We can merge within ID and filter out duplicate merges (I assume you have a default RangeIndex). Then we sort so that the grouping is regardless of order:
import pandas as pd
import numpy as np
df1 = df.reset_index()
df1 = df1.merge(df1, on='ID').query('index_x > index_y')
df1 = pd.DataFrame(np.sort(df1[['Product_x', 'Product_y']].to_numpy(), axis=1))
df1.groupby([*df1]).size()
0 1
A B 2
C 1
D 1
C D 1
dtype: int64
We can merge within ID and filter out duplicate merges (I assume you have a default RangeIndex). Then we sort so that the grouping is regardless of order:
import pandas as pd
import numpy as np
df1 = df.reset_index()
df1 = df1.merge(df1, on='ID').query('index_x > index_y')
df1 = pd.DataFrame(np.sort(df1[['Product_x', 'Product_y']].to_numpy(), axis=1))
df1.groupby([*df1]).size()
0 1
A B 2
C 1
D 1
C D 1
dtype: int64
edited Sep 19 at 20:12
answered Sep 19 at 19:57
ALollzALollz
26.1k5 gold badges24 silver badges43 bronze badges
26.1k5 gold badges24 silver badges43 bronze badges
add a comment
|
add a comment
|
You can use combinations from itertools along with groupby and apply
from itertools import combinations
def get_combs(x):
return pd.DataFrame('Combination': list(combinations(x.Product.values, 2)))
(df.groupby('ID').apply(get_combs)
.reset_index(level=0)
.groupby('Combination')
.count()
)
ID
Combination
(A, B) 2
(A, C) 1
(A, D) 1
(C, D) 1
add a comment
|
You can use combinations from itertools along with groupby and apply
from itertools import combinations
def get_combs(x):
return pd.DataFrame('Combination': list(combinations(x.Product.values, 2)))
(df.groupby('ID').apply(get_combs)
.reset_index(level=0)
.groupby('Combination')
.count()
)
ID
Combination
(A, B) 2
(A, C) 1
(A, D) 1
(C, D) 1
add a comment
|
You can use combinations from itertools along with groupby and apply
from itertools import combinations
def get_combs(x):
return pd.DataFrame('Combination': list(combinations(x.Product.values, 2)))
(df.groupby('ID').apply(get_combs)
.reset_index(level=0)
.groupby('Combination')
.count()
)
ID
Combination
(A, B) 2
(A, C) 1
(A, D) 1
(C, D) 1
You can use combinations from itertools along with groupby and apply
from itertools import combinations
def get_combs(x):
return pd.DataFrame('Combination': list(combinations(x.Product.values, 2)))
(df.groupby('ID').apply(get_combs)
.reset_index(level=0)
.groupby('Combination')
.count()
)
ID
Combination
(A, B) 2
(A, C) 1
(A, D) 1
(C, D) 1
answered Sep 19 at 20:12
stahamtanstahamtan
6784 silver badges10 bronze badges
6784 silver badges10 bronze badges
add a comment
|
add a comment
|
Using itertools and Counter.
import itertools
from collections import Counter
agg_ = lambda x: tuple(itertools.combinations(x, 2))
product = list(itertools.chain(*df.groupby('ID').agg('Product': lambda x: agg_(sorted(x))).Product))
# You actually do not need to wrap product with list. The generator is ok
counts = Counter(product)
Output
Counter(('A', 'B'): 2, ('A', 'C'): 1, ('A', 'D'): 1, ('C', 'D'): 1)
You could also do the following to get a dataframe
pd.DataFrame(list(counts.items()), columns=['combination', 'count'])
combination count
0 (A, B) 2
1 (A, C) 1
2 (A, D) 1
3 (C, D) 1
add a comment
|
Using itertools and Counter.
import itertools
from collections import Counter
agg_ = lambda x: tuple(itertools.combinations(x, 2))
product = list(itertools.chain(*df.groupby('ID').agg('Product': lambda x: agg_(sorted(x))).Product))
# You actually do not need to wrap product with list. The generator is ok
counts = Counter(product)
Output
Counter(('A', 'B'): 2, ('A', 'C'): 1, ('A', 'D'): 1, ('C', 'D'): 1)
You could also do the following to get a dataframe
pd.DataFrame(list(counts.items()), columns=['combination', 'count'])
combination count
0 (A, B) 2
1 (A, C) 1
2 (A, D) 1
3 (C, D) 1
add a comment
|
Using itertools and Counter.
import itertools
from collections import Counter
agg_ = lambda x: tuple(itertools.combinations(x, 2))
product = list(itertools.chain(*df.groupby('ID').agg('Product': lambda x: agg_(sorted(x))).Product))
# You actually do not need to wrap product with list. The generator is ok
counts = Counter(product)
Output
Counter(('A', 'B'): 2, ('A', 'C'): 1, ('A', 'D'): 1, ('C', 'D'): 1)
You could also do the following to get a dataframe
pd.DataFrame(list(counts.items()), columns=['combination', 'count'])
combination count
0 (A, B) 2
1 (A, C) 1
2 (A, D) 1
3 (C, D) 1
Using itertools and Counter.
import itertools
from collections import Counter
agg_ = lambda x: tuple(itertools.combinations(x, 2))
product = list(itertools.chain(*df.groupby('ID').agg('Product': lambda x: agg_(sorted(x))).Product))
# You actually do not need to wrap product with list. The generator is ok
counts = Counter(product)
Output
Counter(('A', 'B'): 2, ('A', 'C'): 1, ('A', 'D'): 1, ('C', 'D'): 1)
You could also do the following to get a dataframe
pd.DataFrame(list(counts.items()), columns=['combination', 'count'])
combination count
0 (A, B) 2
1 (A, C) 1
2 (A, D) 1
3 (C, D) 1
edited Sep 19 at 20:27
answered Sep 19 at 20:14
Buckeye14GuyBuckeye14Guy
5434 silver badges10 bronze badges
5434 silver badges10 bronze badges
add a comment
|
add a comment
|
Another trick with itertools.combinations function:
from itertools import combinations
import pandas as pd
test_df = ... # your df
counts_df = test_df.groupby('ID')['Product'].agg(lambda x: list(combinations(x, 2)))
.apply(pd.Series).stack().value_counts().to_frame()
.reset_index().rename(columns='index': 'Combination', 0:'Count')
print(counts_df)
The output:
Combination Count
0 (A, B) 2
1 (A, C) 1
2 (A, D) 1
3 (C, D) 1
add a comment
|
Another trick with itertools.combinations function:
from itertools import combinations
import pandas as pd
test_df = ... # your df
counts_df = test_df.groupby('ID')['Product'].agg(lambda x: list(combinations(x, 2)))
.apply(pd.Series).stack().value_counts().to_frame()
.reset_index().rename(columns='index': 'Combination', 0:'Count')
print(counts_df)
The output:
Combination Count
0 (A, B) 2
1 (A, C) 1
2 (A, D) 1
3 (C, D) 1
add a comment
|
Another trick with itertools.combinations function:
from itertools import combinations
import pandas as pd
test_df = ... # your df
counts_df = test_df.groupby('ID')['Product'].agg(lambda x: list(combinations(x, 2)))
.apply(pd.Series).stack().value_counts().to_frame()
.reset_index().rename(columns='index': 'Combination', 0:'Count')
print(counts_df)
The output:
Combination Count
0 (A, B) 2
1 (A, C) 1
2 (A, D) 1
3 (C, D) 1
Another trick with itertools.combinations function:
from itertools import combinations
import pandas as pd
test_df = ... # your df
counts_df = test_df.groupby('ID')['Product'].agg(lambda x: list(combinations(x, 2)))
.apply(pd.Series).stack().value_counts().to_frame()
.reset_index().rename(columns='index': 'Combination', 0:'Count')
print(counts_df)
The output:
Combination Count
0 (A, B) 2
1 (A, C) 1
2 (A, D) 1
3 (C, D) 1
answered Sep 19 at 20:18
RomanPerekhrestRomanPerekhrest
67.9k4 gold badges22 silver badges58 bronze badges
67.9k4 gold badges22 silver badges58 bronze badges
add a comment
|
add a comment
|
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f58018049%2fcounting-most-common-combination-of-values-in-dataframe-column%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown