首页 > 解决方案 > 按两列分组时计算出现次数

问题描述

假设我有一个如下所示的熊猫数据框:

df = pd.DataFrame()
df["person"] = ["p1", "p2", "p1", "p3", "p3", "p2", "p2", "p1", "p3", "p1", 
  "p1", "p2", "p2", "p1", "p3", ]
df["type"] = ["a", "a", "a", "a", "b", "a", "a", "b", "b", "b", "a", "a", 
  "b", "a", "b",]
df["value"] = np.random.random(15)

bins = [0, 0.25,0.5,0.75, 1]
labels = [f"{float(i)}-{float(j)}" for i, j in zip(bins[:-1], bins[1:])] 
df["bin"] = pd.cut(df["value"], bins=bins, labels=labels, right = False)

我想插入一个新列,它返回按“类型”分组的“人”的计数。通过浏览 SO,我发现以下代码行可以工作,但前提是我不包括最后一列“bin”。我的问题是如何在包含“bin”列的数据框中插入“counter”列。先感谢您!

df["counter"] = df.groupby(["person", "type"], as_index = False).transform("count")

标签: pythonpandas

解决方案


只需将其更改为

df["counter"] = df.groupby(["person", "type"], as_index = False)['value'].transform("count")

你会得到

   person type     value       bin  counter
0      p1    a  0.134629  0.0-0.25        4
1      p2    a  0.997557  0.75-1.0        4
2      p1    a  0.911967  0.75-1.0        4
3      p3    a  0.278438  0.25-0.5        1
4      p3    b  0.539296  0.5-0.75        3
5      p2    a  0.722150  0.5-0.75        4
6      p2    a  0.724028  0.5-0.75        4
7      p1    b  0.989627  0.75-1.0        2
8      p3    b  0.978790  0.75-1.0        3
9      p1    b  0.197428  0.0-0.25        2
10     p1    a  0.330113  0.25-0.5        4
11     p2    a  0.806856  0.75-1.0        4
12     p2    b  0.430026  0.25-0.5        1
13     p1    a  0.265003  0.25-0.5        4
14     p3    b  0.037202  0.0-0.25        3

推荐阅读