python - 一次性任意数量的不同groupby级别
问题描述
有没有办法使用一些预先构建的 Pandas 函数一次性计算任意数量的不同 groupby 级别?下面是一个包含两列的简单示例。
import pandas as pd
df1 = pd.DataFrame( {
"name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"],
"city" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"],
"dollars":[1, 1, 1, 1, 1, 1] })
group1 = df1.groupby("city").dollars.sum().reset_index()
group1['name']='All'
group2 = df1.groupby("name").dollars.sum().reset_index()
group2['city']='All'
group3 = df1.groupby(["name", "city"]).dollars.sum().reset_index()
total = df1.dollars.sum()
total_df=pd.DataFrame({
"name" : ["All"],
"city" : ["All"],
"dollars": [total] })
all_groups = group3.append([group1, group2, total_df], sort=False)
name city dollars
0 Alice Seattle 1
1 Bob Seattle 2
2 Mallory Portland 2
3 Mallory Seattle 1
0 All Portland 2
1 All Seattle 4
0 Alice All 1
1 Bob All 2
2 Mallory All 3
0 All All 6
所以我带了本。T 示例并将其从 sum() 重建为 agg()。对我来说,下一步是构建一个选项来传递特定的 groupby 组合列表,以防不需要所有组合。
from itertools import combinations
import pandas as pd
df1 = pd.DataFrame( {
"name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"],
"city" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"],
"dollars":[1, 2, 6, 5, 3, 4],
"qty":[2, 3, 4, 1, 5, 6] ,
"id":[1, 1, 2, 2, 3, 3]
})
col_gr = ['name', 'city']
agg_func={'dollars': ['sum', 'max', 'count'], 'qty': ['sum'], "id":['nunique']}
def multi_groupby(in_df, col_gr, agg_func, all_value="ALL"):
tmp1 = pd.DataFrame({**{col: all_value for col in col_gr}}, index=[0])
tmp2 = in_df.agg(agg_func)\
.unstack()\
.to_frame()\
.transpose()\
.dropna(axis=1)
tmp2.columns = ['_'.join(col).strip() for col in tmp2.columns.values]
total = tmp1.join(tmp2)
for r in range(len(col_gr), 0, -1):
for cols in combinations(col_gr, r):
tmp_grp = in_df.groupby(by=list(cols))\
.agg(agg_func)\
.reset_index()\
.assign(**{col: all_value for col in col_gr if col not in cols})
tmp_grp.columns = ['_'.join(col).rstrip('_') for col in tmp_grp.columns.values]
total = pd.concat([total]+[tmp_grp], axis=0, ignore_index=True)
return total
multi_groupby(df1, col_gr, agg_func)
解决方案
假设您正在寻找一种在 中创建所有组合的通用方法groupby
,您可以使用itertools.combinations:
from itertools import combinations
col_gr = ['name', 'city']
col_sum = ['dollars']
all_groups = pd.concat( [ df1.groupby(by=list(cols))[col_sum].sum().reset_index()\
.assign(**{col:'all' for col in col_gr if col not in cols})
for r in range(len(col_gr), 0, -1) for cols in combinations(col_gr, r) ]
+ [ pd.DataFrame({**{col:'all' for col in col_gr},
**{col: df1[col].sum() for col in col_sum},}, index=[0])],
axis=0, ignore_index=True)
print (all_groups)
name city dollars
0 Alice Seattle 1
1 Bob Seattle 2
2 Mallory Portland 2
3 Mallory Seattle 1
4 Alice all 1
5 Bob all 2
6 Mallory all 3
7 all Portland 2
8 all Seattle 4
9 all all 6
推荐阅读
- python - Splinter/selenium 将某些字符解释为转义序列
- scala - 删除的 Intellij sbt 项目不断被重新创建
- asp.net-core - .net 核心身份 2.1 角色授权不起作用
- html - HTML 视频标签在 Eclipse IDE 中不起作用
- javascript - 允许使用 Debugger for Chrome 进行扩展
- python - 自动完成选择和外键
- hibernate - JPA 与 @Column(updatable = false, insertable = false) 合并不返回实际的列值
- json - 哪个 Jackson API 有助于将 Scala 对象转换为 JSON 对象。我已将它用于 Java 到 JSON 的转换
- java - 如何从几个线程读取或修改用户界面上的 java 控件?
- linux - 如何让 logrotate 服从 logrotate.d/uwsgi 而不是 uwsgi.ini?(Centos 6)