python - 如何从另一行和列的 idxmin 获取列值?
问题描述
如果我不能完美地说出问题的标题,我很抱歉。
这实际上与我之前在这里问过的非常相似。我实际上得到了解决方案,但现在我面临不同的问题。
所以我有以下数据框
df = pd.DataFrame({'person':['A', 'A', 'B', 'B', 'A', 'C', 'B', 'C'],
'datetime':['2018-02-26 10:49:32', '2018-02-26 10:58:03', '2018-02-26 10:51:10',
'2018-02-26 10:58:45', '2018-02-26 10:43:34', '2018-02-26 10:49:51',
'2018-02-26 10:51:51', '2018-02-26 10:55:10'],
'location':['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']})
person datetime location
A 2018-02-26 10:49:32 a
A 2018-02-26 10:58:03 b
B 2018-02-26 10:51:10 c
B 2018-02-26 10:58:45 d
A 2018-02-26 10:43:34 e
C 2018-02-26 10:49:51 f
B 2018-02-26 10:51:51 g
C 2018-02-26 10:55:10 h
如果我们按人和时间排序,我们得到
df.sort_values(by=['person', 'datetime'])
person datetime location
A 2018-02-26 10:43:34 e
A 2018-02-26 10:49:32 a
A 2018-02-26 10:58:03 b
B 2018-02-26 10:51:10 c
B 2018-02-26 10:51:51 g
B 2018-02-26 10:58:45 d
C 2018-02-26 10:49:51 f
C 2018-02-26 10:55:10 h
您可以将其解读为 A 前往位置“e”,然后前往位置“a”,然后前往“b”。与人 B 一样,他去了位置“c”,然后是“g”,然后是“d”,依此类推。
我想把它变成这样
person prev_datetime prev_loc curr_datetime curr_loc next_datetime next_loc
A 2018-02-26 10:43:34 e 2018-02-26 10:49:32 a 2018-02-26 10:58:03 b
A 2018-02-26 10:49:32 a 2018-02-26 10:58:03 b 2018-02-26 10:43:34 e
B 2018-02-26 10:51:10 c 2018-02-26 10:51:51 g 2018-02-26 10:58:45 d
B 2018-02-26 10:51:51 g 2018-02-26 10:58:45 d 2018-02-26 10:51:10 c
C 2018-02-26 10:49:51 f 2018-02-26 10:55:10 h 2018-02-26 10:49:51 f
所以,你看到人 A 去位置'e',然后去位置'a',然后去'b',但最后我们让这个人再次回到 e(开始)。
我应用了从我提出的上一个问题中学到的知识并进行了一些修改。
df['datetime'] = pd.to_datetime(df['datetime'])
df1 = df.sort_values(by=['person', 'datetime'])
df1[['curr_datetime','curr_loc']] = df1.groupby('person')['datetime','location'].shift(-1)
d = {'datetime':'prev_datetime','location':'prev_loc'}
df2 = df1[df1['person'].duplicated(keep='last')].rename(columns=d)
输出
person prev_datetime prev_loc next_datetime next_loc
A 2018-02-26 10:43:34 e 2018-02-26 10:49:32 a
A 2018-02-26 10:49:32 a 2018-02-26 10:58:03 b
B 2018-02-26 10:51:10 c 2018-02-26 10:51:51 g
B 2018-02-26 10:51:51 g 2018-02-26 10:58:45 d
C 2018-02-26 10:49:51 f 2018-02-26 10:55:10 h
然后
df2[['next_datetime','next_loc']] = df2.groupby('person')['curr_datetime','curr_loc'].shift(-1)
输出
person prev_datetime prev_loc curr_datetime curr_loc next_datetime next_loc
A 2018-02-26 10:43:34 e 2018-02-26 10:49:32 a 2018-02-26 10:58:03 b
A 2018-02-26 10:49:32 a 2018-02-26 10:58:03 b NaT nan
B 2018-02-26 10:51:10 c 2018-02-26 10:51:51 g 2018-02-26 10:58:45 d
B 2018-02-26 10:51:51 g 2018-02-26 10:58:45 d NaT nan
C 2018-02-26 10:49:51 f 2018-02-26 10:55:10 h NaT nan
默认情况下,其中一些是 Nan,所以我做了以下操作,我想用每个人的行程的 min() 填充这些 nan 值。
df2.loc[df2["next_loc"].isnull()][['next_datetime', 'next_loc']] = df2.loc[df2.groupby("person")["prev_datetime"].idxmin()][['prev_datetime', 'prev_loc']]
哪个不起作用,没有错误,但数据框仍然没有按预期显示数据(仍然是 Nan)。我很困惑,因为没有错误消息,但它仍然不起作用。我不知道是否有比这更好的解决方案。谢谢。
解决方案
numpy.roll
与 一起使用GroupBy.transform
:
df['datetime'] = pd.to_datetime(df['datetime'])
df1 = df.sort_values(by=['person', 'datetime'])
g = df1.groupby('person')['datetime','location']
df1[['curr_datetime','curr_loc']] = g.transform(np.roll, -1)
df1[['next_datetime','next_loc']] = g.transform(np.roll, 1)
print (df1)
person datetime location curr_datetime curr_loc \
4 A 2018-02-26 10:43:34 e 2018-02-26 10:49:32 a
0 A 2018-02-26 10:49:32 a 2018-02-26 10:58:03 b
1 A 2018-02-26 10:58:03 b 2018-02-26 10:43:34 e
2 B 2018-02-26 10:51:10 c 2018-02-26 10:51:51 g
6 B 2018-02-26 10:51:51 g 2018-02-26 10:58:45 d
3 B 2018-02-26 10:58:45 d 2018-02-26 10:51:10 c
5 C 2018-02-26 10:49:51 f 2018-02-26 10:55:10 h
7 C 2018-02-26 10:55:10 h 2018-02-26 10:49:51 f
next_datetime next_loc
4 2018-02-26 10:58:03 b
0 2018-02-26 10:43:34 e
1 2018-02-26 10:49:32 a
2 2018-02-26 10:58:45 d
6 2018-02-26 10:51:10 c
3 2018-02-26 10:51:51 g
5 2018-02-26 10:55:10 h
7 2018-02-26 10:49:51 f
d = {'datetime':'prev_datetime','location':'prev_loc'}
df2 = df1[df1['person'].duplicated(keep='last')].rename(columns=d)
print (df2)
person prev_datetime prev_loc curr_datetime curr_loc \
4 A 2018-02-26 10:43:34 e 2018-02-26 10:49:32 a
0 A 2018-02-26 10:49:32 a 2018-02-26 10:58:03 b
2 B 2018-02-26 10:51:10 c 2018-02-26 10:51:51 g
6 B 2018-02-26 10:51:51 g 2018-02-26 10:58:45 d
5 C 2018-02-26 10:49:51 f 2018-02-26 10:55:10 h
next_datetime next_loc
4 2018-02-26 10:58:03 b
0 2018-02-26 10:43:34 e
2 2018-02-26 10:58:45 d
6 2018-02-26 10:51:10 c
5 2018-02-26 10:55:10 h
推荐阅读
- javascript - 导出并在同一模块中使用的 Jest 模拟函数
- javascript - 在输入更改时更新 sap.ui.model.json.JSONModel
- magento2 - 升级到 Magento 2.4 后结帐/购物车订单总额和运费显示为零
- javascript - 在 array.map 中为多个元素使用 React useState
- reactjs - React SplitButton data-test-id 选项
- python-3.x - 如何在画布中滚动多个帧?
- javascript - 从构造函数中提取对象值
- geocoding - 谷歌地理编码 API 方法
- mysql - 在 MySQL 中将行转换为列
- python - 从多维数组到 Keras 中的其他形状