首页 > 解决方案 > 如何有效地有条件地合并两个数据框

问题描述

我正在尝试根据 GPS 时间戳为每个 GPS 数据包分配相应的计划号和行程号。既然我有来自各种设备的近一百万个 GPS 数据包,我怎样才能有效地做到这一点?

我没有找到任何最佳方法。我现在在所有行上运行循环并将其时间戳与时间表中的所有间隔进行比较,路由表并将匹配的时间表编号附加到每个 GPS 数据包。

GPS数据帧:

import pandas as pd
gps_df = pd.DataFrame({'Device':[1,1,2,2,3,3,3],'time-stamp': ['6:00:00','7:00:30','12:12:12','13:13:13','20:15:10','22:16:10','22:18:23']})

计划数据框:\n

schedule_df = pd.DataFrame({'Device'    :[1,    1,  1,  1,  2,  2,  2,  3,3,    3],
'schedule'  :['A1','A1','A2','A2','B1','B2','B2','C1','C2','C3'],
'route no'  :[1,    2,  1,  2,  1,  5,  6,  1,  1,  2],
'start time' :  ['6:00:00','7:00:01','8:30:00','10:00:00','12:00:00','14:00:00','16:00:00','20:00:00','21:00:00','22:00:00'],
'end time'  :['7:00:00','8:30:00','9:30:00','12:00:00','13:00:00','16:00:00','20:00:00','21:00:00','22:00:00','23:00:00']})

我想得到这样的输出:

gps_df = pd.DataFrame({'Device':[1,1,2,2,3,3,3],
                   'time-stamp':['6:00:00','7:00:30','12:12:12','13:13:13','20:15:10','22:16:10','22:18:23'],
                    'schedule': ['A1','A1','B1','Na','C1','C3','C3'],
                    'route':    [1, 2,  1,  'Na',1, 2,  2]})

标签: pythonpandasmerge

解决方案


试试这个: import pandas as pd

gps_df = pd.DataFrame({'Device':[1,1,2,2,3,3,3],'time-stamp': ['6:00:00','7:00:30','12:12:12','13:13:13','20:15:10','22:16:10','22:18:23']})
schedule_df = pd.DataFrame({'Device'    :[1,    1,  1,  1,  2,  2,  2,  3,3,    3],
'schedule'  :['A1','A1','A2','A2','B1','B2','B2','C1','C2','C3'],
'route no'  :[1,    2,  1,  2,  1,  5,  6,  1,  1,  2],
'start time' :  ['6:00:00','7:00:01','8:30:00','10:00:00','12:00:00','14:00:00','16:00:00','20:00:00','21:00:00','22:00:00'],
'end time'  :['7:00:00','8:30:00','9:30:00','12:00:00','13:00:00','16:00:00','20:00:00','21:00:00','22:00:00','23:00:00']})
print(gps_df)
print(schedule_df)
gps_df = pd.concat([gps_df, schedule_df],sort=True)
gps_df = gps_df.drop('end time', axis=1)
print(gps_df)

输出

   Device time-stamp
0       1    6:00:00
1       1    7:00:30
2       2   12:12:12
3       2   13:13:13
4       3   20:15:10
5       3   22:16:10
6       3   22:18:23


   Device schedule  route no start time  end time
0       1       A1         1    6:00:00   7:00:00
1       1       A1         2    7:00:01   8:30:00
2       1       A2         1    8:30:00   9:30:00
3       1       A2         2   10:00:00  12:00:00
4       2       B1         1   12:00:00  13:00:00
5       2       B2         5   14:00:00  16:00:00
6       2       B2         6   16:00:00  20:00:00
7       3       C1         1   20:00:00  21:00:00
8       3       C2         1   21:00:00  22:00:00
9       3       C3         2   22:00:00  23:00:00


      Device time-stamp schedule route
0       1    6:00:00       A1     1
1       1    7:00:30       A1     2
2       2   12:12:12       B1     1
3       2   13:13:13       Na    Na
4       3   20:15:10       C1     1
5       3   22:16:10       C3     2
6       3   22:18:23       C3     2

希望这可以帮助


推荐阅读