首页 > 解决方案 > Pandas 中的复杂条件聚合

问题描述

在此表中,我想查找每个用户的操作之间的平均天数。

我的意思是,我想按 user_id 分组,然后我想直接从之前的日期中减去每个用户的天数。然后找到每个用户的平均这些天数(每个用户的平均 No_Action 天数)。

+---------+-----------+----------------------+
| User_ID | Action_ID | Action_At            |
+---------+-----------+----------------------+
| 1       | 11        | 2019-01-31T23:00:37Z |
+---------+-----------+----------------------+
| 2       | 12        | 2019-01-31T23:11:12Z |
+---------+-----------+----------------------+
| 3       | 13        | 2019-01-31T23:14:53Z |
+---------+-----------+----------------------+
| 1       | 14        | 2019-02-01T00:00:30Z |
+---------+-----------+----------------------+
| 2       | 15        | 2019-02-01T00:01:03Z |
+---------+-----------+----------------------+
| 3       | 16        | 2019-02-01T00:02:32Z |
+---------+-----------+----------------------+
| 1       | 17        | 2019-02-06T11:30:28Z |
+---------+-----------+----------------------+
| 2       | 18        | 2019-02-06T11:30:28Z |
+---------+-----------+----------------------+
| 3       | 19        | 2019-02-07T09:09:16Z |
+---------+-----------+----------------------+
| 1       | 20        | 2019-02-11T15:37:24Z |
+---------+-----------+----------------------+
| 2       | 21        | 2019-02-18T10:02:07Z |
+---------+-----------+----------------------+
| 3       | 22        | 2019-02-26T12:01:31Z |
+---------+-----------+----------------------+

标签: python-3.xpandas

解决方案


你可以这样做(下次,请提供数据,以便于帮助你;我输入数据比得到解决方案花费的时间要长得多):

df = pd.DataFrame({'User_ID': [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3],
                   'Action_ID': [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22],
                   'Action_At': ['2019-01-31T23:00:37Z', '2019-01-31T23:11:12Z', '2019-01-31T23:14:53Z', '2019-02-01T00:00:30Z', '2019-02-01T00:01:03Z', '2019-02-01T00:02:32Z', '2019-02-06T11:30:28Z', '2019-02-06T11:30:28Z', '2019-02-07T09:09:16Z', '2019-02-11T15:37:24Z', '2019-02-18T10:02:07Z', '2019-02-26T12:01:31Z']})

df.Action_At = pd.to_datetime(df.Action_At)

df.groupby('User_ID').apply(lambda x: (x.Action_At - x.Action_At.shift()).mean())

## User_ID
## 1   3 days 13:32:15.666666
## 2   5 days 19:36:58.333333
## 3   8 days 12:15:32.666666
## dtype: timedelta64[ns]

或者,如果您想在几天内得到解决方案:

df.groupby('User_ID').apply(lambda x: (x.Action_At - x.Action_At.shift()).dt.days.mean())

## User_ID
## 1    3.333333
## 2    5.333333
## 3    8.333333
## dtype: float64

推荐阅读