首页 > 解决方案 > 在 Pandas 中合并来自 groupby() 的计数组合

问题描述

我有一个 df,它是通过对 Principal Investigators 和可能批准其申请的研究中可能的伦理委员会进行分组而创建的。然后我通过 size() 方法计算行数。这给了我一个 PI 每个伦理委员会的试验次数。

import pandas as pd

d = {"principal_investigator": ["Tiger Woods", "Tiger Woods", "Buzz Lightyear", "Maggie Thatcher", "Maggie Thatcher", "Seamus Heaney"],
     "board": ["CREB", "BCCA", "CREB", "CWEB", "BCCA", "CREB"],
     "counts": [2, 1, 2, 3, 1, 1]}
df = pd.DataFrame(data=d)

df

    principal_investigator  board   counts
0   Tiger Woods             CREB    2
1   Tiger Woods             BCCA    1
2   Buzz Lightyear          CREB    2
3   Maggie Thatcher         CWEB    3
4   Maggie Thatcher         BCCA    1
5   Seamus Heaney           CREB    1

但我想要计算(板,计数)的组合。像下面这样的东西可以让我的老板查看更好的视觉效果。

    principal_investigator  board_counts
0   Tiger Woods             (CREB 2, BCCA 1)                
1   Buzz Lightyear          CREB    2
2   Maggie Thatcher         (CWEB 3, BCCA 1)
3   Seamus Heaney           CREB    1

对更好地结合这些方法的建议持开放态度。

标签: pandas

解决方案


我建议您使用另一种演示文稿:

>>> pd.crosstab(df1["principal_investigator"], df1["board"],
                df1["counts"], aggfunc="sum").fillna(0).astype(int)

board                   BCCA  CREB  CWEB
principal_investigator
Buzz Lightyear             0     2     0
Maggie Thatcher            1     0     3
Seamus Heaney              0     1     0
Tiger Woods                1     2     0

回答你的问题(差不多):

>>> df["board"].str.cat(df["counts"].astype(str), sep=" ") \
               .groupby(df["principal_investigator"]) \
               .apply(", ".join) \
               .to_frame("board_counts")

                          board_counts
principal_investigator
Buzz Lightyear                  CREB 2
Maggie Thatcher         CWEB 3, BCCA 1
Seamus Heaney                   CREB 1
Tiger Woods             CREB 2, BCCA 1

推荐阅读