首页 > 解决方案 > 参数化熊猫组

问题描述

有没有办法通过而不是传入硬编码列表来参数化熊猫组?

group_by_cols = "id","week_number"
aggregate_cols = "col1","col2","col3"
df = pd.read_csv(input_file_name)
df_total = df.groupby([group_by_cols])[aggregate_cols].sum()

这可能吗?

标签: pythonpandas

解决方案


[]如果要从[group_by_cols]嵌套列表中删除传递列表:

#for list added []
group_by_cols = ["id","week_number"]
aggregate_cols = ["col1","col2","col3"]

print (type(group_by_cols))
<class 'list'>

df = pd.read_csv(input_file_name)
df_total = df.groupby(group_by_cols)[aggregate_cols].sum()

或者,如果输入是元组,则将它们转换为如下列表:

group_by_cols = "id","week_number"
aggregate_cols = "col1","col2","col3"

像传递元组一样工作:

group_by_cols = ("id","week_number")
aggregate_cols = ("col1","col2","col3")

print (type(group_by_cols))
<class 'tuple'>

df = pd.read_csv(input_file_name)
df_total = df.groupby(list(group_by_cols))[list(aggregate_cols)].sum()

样本数据测试:

df = pd.DataFrame({
        'id':list('aaaabb'),
         'week_number':[4,5,4,5,5,5],
         'col1':[7,8,9,4,2,3],
         'col2':[1,3,5,7,1,0],
         'col3':[5,3,6,9,2,4],
         'col4':[4,3,3,0,3,9]
})


group_by_cols = ["id","week_number"]
aggregate_cols = ["col1","col2","col3"]

df_total = df.groupby(group_by_cols)[aggregate_cols].sum()
print (df_total)
                col1  col2  col3
id week_number                  
a  4              16     6    11
   5              12    10    12
b  5               5     1     6

group_by_cols = "id","week_number"
aggregate_cols = "col1","col2","col3"

df_total = df.groupby(list(group_by_cols))[list(aggregate_cols)].sum()
print (df_total)
                col1  col2  col3
id week_number                  
a  4              16     6    11
   5              12    10    12
b  5               5     1     6

推荐阅读