python - Using pandas, how to find the value counts by two columns
问题描述
I have a pandas data frame with three columns, Protein_A, Protein_B, Interaction. see it here
I want to find all the interactions as value counts by grouping Protein_A and Protein_B. Additionally, the order does not matter when grouping Protein_A and Protein_B.
Protein_A Interaction Protein_B
0 A1BG ER A2M
1 A1BG MI ABCC6
2 ABCC6 AS A1BG
3 A1BG MI ADAM10
4 A1BG MI ADAM17
The result will look something like this:
{AB1G, A2M} -> ER
{AB1G, ABCC6} -> MI, AS
{A1BG, ADAM10} -> MI
{A1BG, ADAM17} -> MI
解决方案
I agree that you want to group rows, but the expected result shows that instead of value counts for each group you actually want a list of interaction codes.
To create such a list (for each group) proceed as follows:
Start from defining a function, which calculates the grouping key - a sorted list of protein codes (A and B), converted into a string:
def protSorted(key):
row = df.loc[key]
return ', '.join(sorted([row.Protein_A, row.Protein_B]))
Then group the source DataFrame by this function, take Interaction column from each group and create a list of interaction codes:
df.groupby(protSorted).Interaction.apply(list)
For your sample data, the result is a Series like below:
A1BG, A2M [ER]
A1BG, ABCC6 [MI, AS]
A1BG, ADAM10 [MI]
A1BG, ADAM17 [MI]
Name: Interaction, dtype: object
Or if you want for each group a string (without surrounding brackets), run instead:
df.groupby(protSorted).Interaction.apply(', '.join)
This time the result is:
A1BG, A2M ER
A1BG, ABCC6 MI, AS
A1BG, ADAM10 MI
A1BG, ADAM17 MI
Name: Interaction, dtype: object
推荐阅读
- python - 统计中的固定效应是什么意思?
- python - [None]* 在 python 中是什么意思
- python - 无法安装 pip install thesaurus python 3.7.4
- php - 如何以这种格式显示时间?
- ajax - 如何使用 laravel 中的选择框过滤数据
- python - 打印命令在 python 中不起作用
- java - 2D for循环数组问题为什么它过长
- google-apis-explorer - OCD ID - 我做错了什么?
- sql - 如何拒绝 SQL Server 中特定用户的所有存储过程?
- hazelcast - 如何始终将数据发送到 localMember 实例