首页 > 解决方案 > 在 datetime 之间合并三个 Pandas 数据框并添加相应的列

问题描述

给定两个数据帧 df1、df2 和 df3,如何连接它们以使 df3 时间戳位于数据帧 df1 和 df2 的开始和结束之间。

我必须根据 df3'Timestamp' 是在 df1 还是 df2 'Start time' 和 'End Time' 中将 Job ID 合并到 df3,并且还要匹配 Node(No.

df1(1230行*3列)

Node      Start Time      End Time      JobID
A         00:03:50        00:05:45      12345
A         00:06:10        00:07:39      56789
A         00:08:30        00:10:45      34567
.
.
.

df2(1130行*3列)

Node      Start Time      End Time      JobID
B         00:02:30        00:07:35      13579
B         00:08:56        00:09:39      24680
B         00:10:32        00:13:47      14680
.
.
.

df3(4002行*3列)

Node      Timestamp     
A         00:05:42       
A         00:09:50       
A         00:11:27       
B         00:04:48
B         00:09:59
B         00:10:32
.
.
.
.

预期输出:df3(4002rows*3 columns)

No.       Timestamp       Job ID
A         00:05:42        12345              
A         00:09:50        34567       
A         00:11:27        NaN
B         00:04:48        13579
B         00:09:59        NaN
B         00:10:32        14680
.
.
.
.

标签: pythonpandasdataframedatetimemerge

解决方案


您可以使用.merge()和过滤.between(),如下所示:

df1_3 = df1.merge(df3, on='Node')
df1_3_filtered = df1_3[df1_3['Timestamp'].between(df1_3['Start Time'], df1_3['End Time'])]

df2_3 = df2.merge(df3, on='Node')
df2_3_filtered = df2_3[df2_3['Timestamp'].between(df2_3['Start Time'], df2_3['End Time'])]

df_out = df1_3_filtered.append(df2_3_filtered)[['Node', 'JobID', 'Timestamp']]
df_out = df3.merge(df_out, how='left')

结果:

print(df_out)


  Node Timestamp    JobID
0    A  00:05:42  12345.0
1    A  00:09:50  34567.0
2    A  00:11:27      NaN
3    B  00:04:48  13579.0
4    B  00:09:59      NaN
5    B  00:10:32  14680.0

编辑

如果您有多个具有相同结构的数据框,df1并且df2想要与 合并df3,您可以执行以下操作:

只需将所有数据框放入下面的列表List_dfs中:

List_dfs = [df1, df2]              # put all your dataframes of same structure here

然后,运行下面的代码。您将在以下位置获得所有这些数据帧的合并和过滤结果df_out

df_all_filtered = pd.DataFrame()   # init. df for acculumating filtered results
for df in List_dfs:
    dfx_3 = df.merge(df3, on='Node')
    dfx_3_filtered = dfx_3[dfx_3['Timestamp'].between(dfx_3['Start Time'], dfx_3['End Time'])]
    df_all_filtered = df_all_filtered.append(dfx_3_filtered)   # append filtered result

df_out = df_all_filtered[['Node', 'JobID', 'Timestamp']]
df_out = df3.merge(df_out, how='left')

推荐阅读