首页 > 解决方案 > Using pandas, how to find the value counts by two columns

问题描述

I have a pandas data frame with three columns, Protein_A, Protein_B, Interaction. see it here

I want to find all the interactions as value counts by grouping Protein_A and Protein_B. Additionally, the order does not matter when grouping Protein_A and Protein_B.

    Protein_A   Interaction      Protein_B
0   A1BG        ER               A2M
1   A1BG        MI               ABCC6
2   ABCC6       AS               A1BG
3   A1BG        MI               ADAM10
4   A1BG        MI               ADAM17

The result will look something like this:

{AB1G, A2M}     -> ER
{AB1G, ABCC6}   -> MI, AS
{A1BG, ADAM10}  -> MI
{A1BG, ADAM17}  -> MI 

标签: pythonpandaspandas-groupby

解决方案


I agree that you want to group rows, but the expected result shows that instead of value counts for each group you actually want a list of interaction codes.

To create such a list (for each group) proceed as follows:

Start from defining a function, which calculates the grouping key - a sorted list of protein codes (A and B), converted into a string:

def protSorted(key):
    row = df.loc[key]
    return ', '.join(sorted([row.Protein_A, row.Protein_B]))

Then group the source DataFrame by this function, take Interaction column from each group and create a list of interaction codes:

df.groupby(protSorted).Interaction.apply(list)

For your sample data, the result is a Series like below:

A1BG, A2M           [ER]
A1BG, ABCC6     [MI, AS]
A1BG, ADAM10        [MI]
A1BG, ADAM17        [MI]
Name: Interaction, dtype: object

Or if you want for each group a string (without surrounding brackets), run instead:

df.groupby(protSorted).Interaction.apply(', '.join)

This time the result is:

A1BG, A2M           ER
A1BG, ABCC6     MI, AS
A1BG, ADAM10        MI
A1BG, ADAM17        MI
Name: Interaction, dtype: object

推荐阅读