python - 如何对列值在一定范围内的两个数据框进行外部合并?
问题描述
这是这个的后续问题
我有两个dataframes
:
print df_1
timestamp A B
0 2016-05-15 0.020228 0.026572
1 2016-05-15 0.057780 0.175499
2 2016-05-15 0.098808 0.620986
3 2016-05-17 0.158789 1.014819
4 2016-05-17 0.038129 2.384590
5 2018-05-17 0.011111 9.999999
print df_2
start end event
0 2016-05-14 2016-05-16 E1
1 2016-05-14 2016-05-16 E2
2 2016-05-17 2016-05-18 E3
如果介于和之间,我想合并df_1
并df_2
进入event column
。df_1
timestamp
start
end
问题以及与此问题的差异是
1) 那event
sE1
和E2
具有相同的start
和end
。
2) 同样在df_1
第6行不属于任何区间。
最后,我希望这两个事件和没有任何事件的行都有NA
。
所以我希望我的结果dataframe
是这样的
timestamp A B event
0 2016-05-15 0.020228 0.026572 E1
1 2016-05-15 0.057780 0.175499 E1
2 2016-05-15 0.098808 0.620986 E1
3 2016-05-15 0.020228 0.026572 E2
4 2016-05-15 0.057780 0.175499 E2
5 2016-05-15 0.098808 0.620986 E2
6 2016-05-17 0.158789 1.014819 E3
7 2016-05-17 0.038129 2.384590 E3
8 2018-05-17 0.011111 9.999999 NA
解决方案
import pandas as pd
df_1 = pd.DataFrame({'timestamp':['2016-05-15','2016-05-15','2016-05-15','2016-05-17','2016-05-17','2018-05-17'],
'A':[1,1,1,1,1,1]})
df_2 = pd.DataFrame({'start':['2016-05-14','2016-05-14','2016-05-17'],
'end':['2016-05-16','2016-05-16','2016-05-18'],
'event':['E1','E2','E3']})
df_1.timestamp = pd.to_datetime(df_1.timestamp, format='%Y-%m-%d')
df_2.start = pd.to_datetime(df_2.start, format='%Y-%m-%d')
df_2.end = pd.to_datetime(df_2.end, format='%Y-%m-%d')
# convert game_ref_dt to long format with all the dates in between, and do a left merge on date
df_2_2 = pd.melt(df_2, id_vars='event', value_name='timestamp')
df_2_2.timestamp = pd.to_datetime(df_2_2.timestamp)
df_2_2.set_index('timestamp', inplace=True)
df_2_2.drop('variable', axis=1, inplace=True)
df_2_3 = df_2_2.groupby('event').resample('D').ffill().reset_index(level=0, drop=True).reset_index()
df_2 = pd.merge(df_2, df_2_3)
df_2 = df_2.drop(columns=['start', 'end'])
df_1 = df_1.merge(df_2,on='timestamp', how='left')
print(df_1)
timestamp A event
0 2016-05-15 1 E1
1 2016-05-15 1 E2
2 2016-05-15 1 E1
3 2016-05-15 1 E2
4 2016-05-15 1 E1
5 2016-05-15 1 E2
6 2016-05-17 1 E3
7 2016-05-17 1 E3
8 2018-05-17 1 NaN
归功于此
也是这个解决方案,但没有NA
在最后一行给出
import pandas as pd
df_1 = pd.DataFrame({'timestamp':['2016-05-15','2016-05-15','2016-05-15','2016-05-17','2016-05-17','2018-05-17'],
'A':[1,1,1,1,1,1]})
df_2 = pd.DataFrame({'start':['2016-05-14','2016-05-14','2016-05-17'],
'end':['2016-05-16','2016-05-16','2016-05-18'],
'event':['E1','E2','E3']})
df_try2 = pd.merge(df_1.assign(key=1), df_2.assign(key=1), on='key').query('timestamp >= start and timestamp <= end')
print(df_try2)
timestamp A key start end event
0 2016-05-15 1 1 2016-05-14 2016-05-16 E1
1 2016-05-15 1 1 2016-05-14 2016-05-16 E2
3 2016-05-15 1 1 2016-05-14 2016-05-16 E1
4 2016-05-15 1 1 2016-05-14 2016-05-16 E2
6 2016-05-15 1 1 2016-05-14 2016-05-16 E1
7 2016-05-15 1 1 2016-05-14 2016-05-16 E2
11 2016-05-17 1 1 2016-05-17 2016-05-18 E3
14 2016-05-17 1 1 2016-05-17 2016-05-18 E3
推荐阅读
- android - 从后台调用 REST API 时 UI 冻结?
- javascript - 如何在 React-Native 或 JavaScript 中实现 Shopify - 客户登录?
- python-3.x - 使用 cv2 和 numpy 的奇数图像
- ruby-on-rails - 保存记录时忽略Rails Form Timezone
- mysql - 删除具有两个或多个条件的重复标题的行
- javascript - 用户值在 jQuery 的警报功能之前是准确的,但 saveorupdate 函数将空值保留在 db 表中
- mariadb - mariadb 中的 Fetch Check 约束定义
- sql - 在asp.net中通过用户输入减去Sql总和值
- java - How to calculate String Buffer capacity?
- c - Visual C 中除以零异常处理