python - 如何有效地有条件地合并两个数据框
问题描述
我正在尝试根据 GPS 时间戳为每个 GPS 数据包分配相应的计划号和行程号。既然我有来自各种设备的近一百万个 GPS 数据包,我怎样才能有效地做到这一点?
我没有找到任何最佳方法。我现在在所有行上运行循环并将其时间戳与时间表中的所有间隔进行比较,路由表并将匹配的时间表编号附加到每个 GPS 数据包。
GPS数据帧:
import pandas as pd
gps_df = pd.DataFrame({'Device':[1,1,2,2,3,3,3],'time-stamp': ['6:00:00','7:00:30','12:12:12','13:13:13','20:15:10','22:16:10','22:18:23']})
计划数据框:\n
schedule_df = pd.DataFrame({'Device' :[1, 1, 1, 1, 2, 2, 2, 3,3, 3],
'schedule' :['A1','A1','A2','A2','B1','B2','B2','C1','C2','C3'],
'route no' :[1, 2, 1, 2, 1, 5, 6, 1, 1, 2],
'start time' : ['6:00:00','7:00:01','8:30:00','10:00:00','12:00:00','14:00:00','16:00:00','20:00:00','21:00:00','22:00:00'],
'end time' :['7:00:00','8:30:00','9:30:00','12:00:00','13:00:00','16:00:00','20:00:00','21:00:00','22:00:00','23:00:00']})
我想得到这样的输出:
gps_df = pd.DataFrame({'Device':[1,1,2,2,3,3,3],
'time-stamp':['6:00:00','7:00:30','12:12:12','13:13:13','20:15:10','22:16:10','22:18:23'],
'schedule': ['A1','A1','B1','Na','C1','C3','C3'],
'route': [1, 2, 1, 'Na',1, 2, 2]})
解决方案
试试这个: import pandas as pd
gps_df = pd.DataFrame({'Device':[1,1,2,2,3,3,3],'time-stamp': ['6:00:00','7:00:30','12:12:12','13:13:13','20:15:10','22:16:10','22:18:23']})
schedule_df = pd.DataFrame({'Device' :[1, 1, 1, 1, 2, 2, 2, 3,3, 3],
'schedule' :['A1','A1','A2','A2','B1','B2','B2','C1','C2','C3'],
'route no' :[1, 2, 1, 2, 1, 5, 6, 1, 1, 2],
'start time' : ['6:00:00','7:00:01','8:30:00','10:00:00','12:00:00','14:00:00','16:00:00','20:00:00','21:00:00','22:00:00'],
'end time' :['7:00:00','8:30:00','9:30:00','12:00:00','13:00:00','16:00:00','20:00:00','21:00:00','22:00:00','23:00:00']})
print(gps_df)
print(schedule_df)
gps_df = pd.concat([gps_df, schedule_df],sort=True)
gps_df = gps_df.drop('end time', axis=1)
print(gps_df)
输出
Device time-stamp
0 1 6:00:00
1 1 7:00:30
2 2 12:12:12
3 2 13:13:13
4 3 20:15:10
5 3 22:16:10
6 3 22:18:23
Device schedule route no start time end time
0 1 A1 1 6:00:00 7:00:00
1 1 A1 2 7:00:01 8:30:00
2 1 A2 1 8:30:00 9:30:00
3 1 A2 2 10:00:00 12:00:00
4 2 B1 1 12:00:00 13:00:00
5 2 B2 5 14:00:00 16:00:00
6 2 B2 6 16:00:00 20:00:00
7 3 C1 1 20:00:00 21:00:00
8 3 C2 1 21:00:00 22:00:00
9 3 C3 2 22:00:00 23:00:00
Device time-stamp schedule route
0 1 6:00:00 A1 1
1 1 7:00:30 A1 2
2 2 12:12:12 B1 1
3 2 13:13:13 Na Na
4 3 20:15:10 C1 1
5 3 22:16:10 C3 2
6 3 22:18:23 C3 2
希望这可以帮助
推荐阅读
- python - 将 pandas 数据帧的每个元素转换为 dict
- angular - Electron-Angular 桌面应用程序。截屏时黑屏 2-3 秒
- javascript - moment.js 的 isBetween 方法显示错误
- python - sklearn 是否支持内核回归?
- dbt - dbt 测试可以相互依赖吗?
- docker - 使用带有管理程序的 docker 而不是 wsl2
- c - 如何使用 C 表达式操作数将函数地址传递给汇编程序指令
- azure-devops - 有没有办法将 ECR 映像设置为 Azure Devops 中的构建工件?
- java - 在linux中安装Tn5250j模拟器给出java.lang.NoClassDefFoundError:无法初始化类sun.awt.X11.XToolkit
- python - 当列中两个连续的单元格值(字符串)相同时如何拆分数据框