python - Merging Dataframe with Different Dates?
问题描述
I want to merge a seperate dataframe (df2) with the main dataframe (df1), but if, for a given row, the dates in df1 do not exist in df2, then search for the recent date before the underlying date in df1.
I tried to use pd.merge, but it would remove rows with unmatched dates, and only keep the rows that matched in both df's.
df1 = [['2007-01-01','A'],
['2007-01-02','B'],
['2007-01-03','C'],
['2007-01-04','B'],
['2007-01-06','C']]
df2 = [['2007-01-01','B',3],
['2007-01-02','A',4],
['2007-01-03','B',5],
['2007-01-06','C',3]]
df1 = pd.DataFrame(df1)
df2 = pd.DataFrame(df2)
df1[0] = pd.to_datetime(df1[0])
df2[0] = pd.to_datetime(df2[0])
Current df1 | pd.merge():
0 1 2
0 2007-01-06 C 3
Only gets the exact date between both df's, it does not consider value from recent dates.
Expected df1:
0 1 2
0 2007-01-01 A NaN
1 2007-01-02 B 3
2 2007-01-03 C NaN
3 2007-01-04 B 3
4 2007-01-06 C 3
Getting NaNs because data doesn't exist on or before that date in df2. For index row 1, it gets data before a day before, while index row 4, it gets data exactly on the same day.
解决方案
使用检查您的输出merge_asof
pd.merge_asof(df1,df2,on=0,by=1,allow_exact_matches=True)
Out[15]:
0 1 2
0 2007-01-01 A NaN
1 2007-01-02 B 3.0
2 2007-01-03 C NaN
3 2007-01-04 B 5.0 # here should be 5 since 5 ' date is more close. also df2 have two B
4 2007-01-06 C 3.0
推荐阅读
- setcookie - IBM Http Server - 无法更新 cookie
- django - 模板中未显示的表单中的 DJANGO 初始值(有些不显示)
- laravel - 在 https 中加载 swagger UI 内容 - Lumen
- ms-access-2010 - 我可以在 Ms Access Form 的未绑定文本字段中的 Dcount 公式后添加文本吗?
- vue.js - Promise 被拒绝并且无法在 VUE 和 Directus 中获取Collections 或 getItems
- laravel - 如何修复 Laravel 十进制数问题
- r - R 混淆矩阵误差数据和参考因素同级别
- matplotlib - 从轮子为 aws lambda 安装 matplotlib
- sql - 从表中选择最近 3 个连续状态为失败的记录
- mysql - 如何在MYSQL中按正则表达式按组选择计数数据?