python - User-defined function with arbitrary number of grouping variables in pandas
问题描述
My data looks like this:
df = pd.DataFrame({
'cat_1': ['A'] * 3 + ['B'] * 3,
'cat_2': ['x', 'y', 'z'] * 2,
'value': [1, 2, 3, 4, 5, 6]
})
I want to create a function that groups (with a variable number of groups) and sums my data. For instance, the following functions achieve this end for one and two groups, respectively.
def grp_and_sum(data, grp_var, sum_var):
df = data.groupby([grp_var])[sum_var]\
.sum()
return(df)
def grp_and_sum_2(data, grp_var1, grp_var2, sum_var):
df = data.groupby([grp_var1, grp_var2])[sum_var]\
.sum()
return(df)
These functions are more-or-less identical save for the variable number of 'grouping' variables. How do I generalize the first function to accept an arbitrary number of grouping variables? Thank you.
解决方案
You can use varargs for the groupers, but your sum_var
argument will have to be passed in as a keyword argument.
def grp_and_sum_n(data, *args, sum_var):
return data.groupby([*args])[sum_var].sum()
grp_and_sum_n(df, 'cat_2', sum_var='value')
cat_2
x 5
y 7
z 9
Name: value, dtype: int64
grp_and_sum_n(df, 'cat_1', 'cat_2', sum_var='value')
cat_1 cat_2
A x 1
y 2
z 3
B x 4
y 5
z 6
Name: value, dtype: int64
推荐阅读
- sql - 在包含记录的现有表中,如何创建一个新的 datetime2(2) 列并使用基于另一列的值填充它?
- django - CSRF禁止403的所有可能原因
- swift5 - Swift中的嵌套函数意外行为?
- c# - 玩家不断穿墙
- python - USB 设备没有 langid
- reactjs - 如何使用 ”
“在函数内部? - ios - 安装 FBAudienceNetwork (5.10.1) 失败
- java - 有没有办法在创建新项目期间更改 IntelliJ 中的设置并禁用 Main 类的自动生成?
- c# - 负加速度恒加速度运动
- mongodb - 运行时从 time 中删除 NumberLong