首页 > 解决方案 > User-defined function with arbitrary number of grouping variables in pandas

问题描述

My data looks like this:

df = pd.DataFrame({
    'cat_1': ['A'] * 3 + ['B'] * 3,
    'cat_2': ['x', 'y', 'z'] * 2,
    'value': [1, 2, 3, 4, 5, 6]
})

I want to create a function that groups (with a variable number of groups) and sums my data. For instance, the following functions achieve this end for one and two groups, respectively.

def grp_and_sum(data, grp_var, sum_var):

    df = data.groupby([grp_var])[sum_var]\
         .sum()

    return(df)

def grp_and_sum_2(data, grp_var1, grp_var2, sum_var):

    df = data.groupby([grp_var1, grp_var2])[sum_var]\
        .sum()

    return(df)

These functions are more-or-less identical save for the variable number of 'grouping' variables. How do I generalize the first function to accept an arbitrary number of grouping variables? Thank you.

标签: pythonpandaspandas-groupby

解决方案


You can use varargs for the groupers, but your sum_var argument will have to be passed in as a keyword argument.

def grp_and_sum_n(data, *args, sum_var):
    return data.groupby([*args])[sum_var].sum()

grp_and_sum_n(df, 'cat_2', sum_var='value')
cat_2
x    5
y    7
z    9
Name: value, dtype: int64

grp_and_sum_n(df, 'cat_1', 'cat_2', sum_var='value')
cat_1  cat_2
A      x        1
       y        2
       z        3
B      x        4
       y        5
       z        6
Name: value, dtype: int64

推荐阅读