首页 > 解决方案 > 如何在 Python 中将 pandas groupby 应用于多个列并聚合元组列表中的列?

问题描述

我有一个熊猫数据框,可以说:

data = {"action"  : ["create_ticket", "create_ticket", "create_ticket"],
        "start"   : ["2016-01-02", "2016-01-02", "2016-01-21"],
        "end"     : ["2016-01-04", "2016-01-05", "2016-01-28"],
        "duration": [2, 3, 7]
       }

df = pd.DataFrame (data, columns = ["action", "start", "end", "duration"])

看起来像:

    action          start       end         duration
0   create_ticket   2016-01-02  2016-01-04  2
1   create_ticket   2016-01-02  2016-01-05  3
2   create_ticket   2016-01-21  2016-01-28  7

现在,我想将前两列(action和)分组,然后start将这两列聚合到一个元组列表中。所以我想要的输出看起来像:endduration

    action          start       endpoints
0   create_ticket   2016-01-02  [(2016-01-04, 2), (2016-01-05, 3)]
2   create_ticket   2016-01-21  [(2016-01-28, 7)]

我尝试执行:

df = df.groupby(['action', 'start'])['end', 'duration'].apply(list).to_frame()
df.reset_index(inplace=True)

但这给出了:

    action          start       0
0   create_ticket   2016-01-02  [end, duration]
1   create_ticket   2016-01-21  [end, duration]

如何解决这个问题?

标签: pythonpandaspandas-groupby

解决方案


用于:df.apply_df.values

In [43]: df.groupby(['action', 'start'])[['end', 'duration']].apply(lambda x: tuple(x.values))
Out[43]: 
action         start     
create_ticket  2016-01-02    ([2016-01-04, 2], [2016-01-05, 3])
               2016-01-21                    ([2016-01-28, 7],)
dtype: object

推荐阅读