首页 > 解决方案 > Merging Dataframe with Different Dates?

问题描述

I want to merge a seperate dataframe (df2) with the main dataframe (df1), but if, for a given row, the dates in df1 do not exist in df2, then search for the recent date before the underlying date in df1.

I tried to use pd.merge, but it would remove rows with unmatched dates, and only keep the rows that matched in both df's.

df1 = [['2007-01-01','A'],
       ['2007-01-02','B'],
       ['2007-01-03','C'],
       ['2007-01-04','B'],
       ['2007-01-06','C']]

df2 = [['2007-01-01','B',3],
       ['2007-01-02','A',4],
       ['2007-01-03','B',5],
       ['2007-01-06','C',3]]

df1 = pd.DataFrame(df1)
df2 = pd.DataFrame(df2)
df1[0] = pd.to_datetime(df1[0])
df2[0] = pd.to_datetime(df2[0])

Current df1 | pd.merge():

    0           1   2
0   2007-01-06  C   3  

Only gets the exact date between both df's, it does not consider value from recent dates.

Expected df1:

    0           1   2
0   2007-01-01  A   NaN
1   2007-01-02  B   3
2   2007-01-03  C   NaN
3   2007-01-04  B   3
4   2007-01-06  C   3

Getting NaNs because data doesn't exist on or before that date in df2. For index row 1, it gets data before a day before, while index row 4, it gets data exactly on the same day.

标签: pythonpandasdataframedata-manipulation

解决方案


使用检查您的输出merge_asof

pd.merge_asof(df1,df2,on=0,by=1,allow_exact_matches=True)
Out[15]: 
           0  1    2
0 2007-01-01  A  NaN
1 2007-01-02  B  3.0
2 2007-01-03  C  NaN
3 2007-01-04  B  5.0 # here should be 5 since 5 ' date is more close. also df2 have two B 
4 2007-01-06  C  3.0

推荐阅读