pandas - groupby 并计算多列数据框
问题描述
我有一个df
df = pd.DataFrame([
[1, 1, 'A', 10],
[4, 1 ,'A', 6],
[7, 2 ,'A', 3],
[2, 2 ,'A', 4],
[6, 2 ,'B', 9],
[5, 2 ,'B', 7],
[5, 1 ,'B', 12],
[5, 1 ,'B', 4],
[5, 2 ,'C', 9],
[5, 1 ,'C', 3],
[5, 1 ,'C', 4],
[5, 2 ,'C', 7]
],
index=['A', 'A', 'A','A','A','A','A','A','A','A','A','A'],
columns=['A', 'B', 'C', 'D'])
我可以使用以下方法计算按列 A 分组的列 D 的非零值的数量:
df['countTrans'] = df['D'].ne(0).groupby(df['A']).transform('sum')
输出是:
df:
A B C D countTrans
A 1 1 A 10 1.0
A 4 1 A 0 0.0
A 7 2 A 3 1.0
A 2 2 A 4 1.0
A 6 2 B 9 1.0
A 5 2 B 7 7.0
A 5 1 B 12 7.0
A 5 1 B 4 7.0
A 5 2 C 9 7.0
A 5 1 C 3 7.0
A 5 1 C 4 7.0
A 5 2 C 7 7.0
但是,我还想不仅按 A 列而且按 B 列进行分组。我尝试了以下变体:
df['countTrans'] = df['D'].ne(0).groupby(df['A'], df['B']).transform('sum')
df['countTrans'] = df['D'].ne(0).groupby(df['A','B']).transform('sum')
没有成功
我想要的输出看起来像:
df:
A B C D countTrans
A 1 1 A 10 1.0
A 4 1 A 0 0.0
A 7 2 A 3 1.0
A 2 2 A 4 1.0
A 6 2 B 9 1.0
A 5 2 B 7 3.0
A 5 1 B 12 4.0
A 5 1 B 4 4.0
A 5 2 C 9 3.0
A 5 1 C 3 4.0
A 5 1 C 4 4.0
A 5 2 C 7 3.0
解决方案
可能的解决方案是传递Series
给list
:
df['countTrans'] = df['D'].ne(0).groupby([df['A'], df['B']]).transform('sum')
print (df)
A B C D countTrans
A 1 1 A 10 1
A 4 1 A 6 1
A 7 2 A 3 1
A 2 2 A 4 1
A 6 2 B 9 1
A 5 2 B 7 3
A 5 1 B 12 4
A 5 1 B 4 4
A 5 2 C 9 3
A 5 1 C 3 4
A 5 1 C 4 4
A 5 2 C 7 3
或者通过DataFrame.assign
('clean'
我认为更多)创建帮助列:
df['countTrans'] = df.assign(E = df['D'].ne(0)).groupby(['A','B'])['E'].transform('sum')
#similar solution with overwrite D
#df['countTrans'] = df.assign(D = df['D'].ne(0)).groupby(['A','B'])['D'].transform('sum')
推荐阅读
- sql - PostgreSQL - Constraint Based on Column in Another Table
- google-apps-script - Get values from multiple non-adjacent cells
- javascript - jQuery tabs: how to addClass to a separate UL tab from the tab div?
- reactjs - Typescript build issue with lerna
- java - Is there a standard way in Spring Boot Actuators of checking the health of child services?
- python - Error while running imagetostring function of pytesseract
- javascript - Wait loop to finish first - Typescript (Angular)
- python - How to know if numbers in a list follow an ascending numerical order (eg: 1,2,3,4,5 etc.) without skipping any number
- c++ - Static Library Symbol missing namespace prefix and Linker fails
- r - 有没有办法使用 R 降价“绑定”闪亮的 UI 组件?