首页 > 解决方案 > 如何从 pandas join 中的第二个数据框中填充先前的值

问题描述

我想加入 2 个数据框并填写任何 nan 值。但是,df 缺少 df2 中的第一个值。我怎样才能从df中填写呢?

import pandas as pd
from datetime import datetime, timedelta

date_today = datetime.now()
days = pd.date_range(date_today, date_today + timedelta(7), freq='D')
data = range(len(days)-1)
days = days.delete(3)
date_today = date_today + timedelta(days=3)
df = pd.DataFrame({'test': days, 'col_df': data})
df = df.set_index('test')
print(df)

days2 = pd.date_range(date_today, date_today + timedelta(7), freq='D')
data2 = range(len(days2))
df2 = pd.DataFrame({'test': days2, 'col_df22': data2})
df2 = df2.set_index('test')
print(df2)

print(df2.join(df))

df

                            col_df
test                              
2020-12-08 15:22:00.997578       0
2020-12-09 15:22:00.997578       1
2020-12-10 15:22:00.997578       2
2020-12-12 15:22:00.997578       3
2020-12-13 15:22:00.997578       4
2020-12-14 15:22:00.997578       5
2020-12-15 15:22:00.997578       6

df2

                            col_df22
test                                
2020-12-11 15:22:00.997578         0
2020-12-12 15:22:00.997578         1
2020-12-13 15:22:00.997578         2
2020-12-14 15:22:00.997578         3
2020-12-15 15:22:00.997578         4
2020-12-16 15:22:00.997578         5
2020-12-17 15:22:00.997578         6
2020-12-18 15:22:00.997578         7

df2.join(df)

                           col_df22  col_df
test                                        
2020-12-11 15:22:00.997578         0     NaN
2020-12-12 15:22:00.997578         1     3.0
2020-12-13 15:22:00.997578         2     4.0
2020-12-14 15:22:00.997578         3     5.0
2020-12-15 15:22:00.997578         4     6.0
2020-12-16 15:22:00.997578         5     NaN
2020-12-17 15:22:00.997578         6     NaN
2020-12-18 15:22:00.997578         7     NaN

我想:

                            col_df22  col_df
test                                        
2020-12-11 15:22:00.997578         0     2.0
2020-12-12 15:22:00.997578         1     3.0
2020-12-13 15:22:00.997578         2     4.0
2020-12-14 15:22:00.997578         3     5.0
2020-12-15 15:22:00.997578         4     6.0
2020-12-16 15:22:00.997578         5     6.0
2020-12-17 15:22:00.997578         6     6.0
2020-12-18 15:22:00.997578         7     6.0

标签: python-3.xpandas

解决方案


你可以试试merge_asof

pd.merge_asof(df2, df, left_index=True, right_index=True)

输出:

                            col_df22  col_df
test                                        
2020-12-11 10:30:20.464611         0       2
2020-12-12 10:30:20.464611         1       3
2020-12-13 10:30:20.464611         2       4
2020-12-14 10:30:20.464611         3       5
2020-12-15 10:30:20.464611         4       6
2020-12-16 10:30:20.464611         5       6
2020-12-17 10:30:20.464611         6       6
2020-12-18 10:30:20.464611         7       6

推荐阅读