python - Python pandas:如何根据其他列的最大值查找时差?
问题描述
我一直在试图找出这个数据集中每个人参与最多的活动所花费的时间:
name activity timestamp money_spent
0 Chandler Bing party 2017-08-04 08:00:00 51
1 Chandler Bing party 2017-08-04 13:00:00 60
2 Chandler Bing party 2017-08-04 15:00:00 59
5 Harry Kane party 2017-08-04 07:00:00 68
4 Harry Kane party 2017-08-04 11:00:00 90
3 Harry Kane football 2017-08-04 13:00:00 80
11 Joey Tribbiani football 2017-08-04 08:00:00 84
9 Joey Tribbiani party 2017-08-04 09:00:00 54
10 Joey Tribbiani party 2017-08-04 10:00:00 67
6 John Doe beach 2017-08-04 07:00:00 63
7 John Doe beach 2017-08-04 12:00:00 61
8 John Doe beach 2017-08-04 14:00:00 65
12 Monica Geller travel 2017-08-04 07:00:00 90
13 Monica Geller travel 2017-08-04 08:00:00 96
14 Monica Geller travel 2017-08-04 09:00:00 74
15 Phoebe Buffey travel 2017-08-04 10:00:00 52
16 Phoebe Buffey travel 2017-08-04 12:00:00 84
17 Phoebe Buffey football 2017-08-04 15:00:00 58
18 Ross Geller party 2017-08-04 09:00:00 96
19 Ross Geller party 2017-08-04 11:00:00 81
20 Ross Geller travel 2017-08-04 14:00:00 60
df['timestamp'] = pd.to_datetime(df.timestamp, format='%Y-%m-%d %H:%M:%S')
df # party day 2017-08-04 for some guys.
# find most involved activity and time spent on that activity per person.
所需输出:
activity_num activity time_diff
name
Chandler Bing 1.0 party 07:00:00
Harry Kane 2.0 party 04:00:00
Joey Tribbiani 2.0 party 02:00:00
John Doe 1.0 beach 07:00:00
Monica Geller 1.0 travel 02:00:00
Phoebe Buffey 2.0 travel 03:00:00
Ross Geller 2.0 travel 03:00:00
注意:Harry Kane 从早上 7 点到 11 点参加派对,所以他的回答是 4 小时。
df.head()
name activity timestamp money_spent
0 Chandler Bing party 2017-08-04 08:00:00 51
1 Chandler Bing party 2017-08-04 13:00:00 60
2 Chandler Bing party 2017-08-04 15:00:00 59
3 Harry Kane football 2017-08-04 13:00:00 80
4 Harry Kane party 2017-08-04 11:00:00 90
5 Harry Kane party 2017-08-04 07:00:00 68
我的尝试:
df.groupby(['name','activity'])['timestamp'].max() # no idea
解决方案
尝试这个:
gb = df.groupby(['name', 'activity'])['timestamp']
print((gb.max() - gb.min()).sort_values(ascending=False).reset_index().drop_duplicates(subset='name'))
输出:
name activity timestamp
0 John Doe beach 07:00:00
1 Chandler Bing party 07:00:00
2 Harry Kane party 04:00:00
3 Ross Geller party 02:00:00
4 Phoebe Buffey travel 02:00:00
5 Monica Geller travel 02:00:00
6 Joey Tribbiani party 01:00:00
推荐阅读
- java - 需要知道碰撞何时开始和结束 box2d
- node.js - 快速渲染模板在视图的子文件夹中不起作用?
- json - 如何使用 Angular 2/lodash 将一个 Json 拆分为多个 Json
- visual-studio - 调试器不会在断点处停止
- python - 为什么我有两个不同的python目录,python36-32和python36?
- java - 使用 servlet e Jsp 创建图表
- oracle - Oracle Apex 18.1 编辑 IR
- c# - 如何将任何给定的 SQL/HQL 选择查询动态转换为等效计数查询?
- firebase - Firebase 数据库规则:仅当 newData 具有带有用户 ID 的子 ID 时才应用更新
- joomla - Joomla 会话表中未删除的旧会话