首页 > 解决方案 > 如何同时连接和填充?

问题描述

假设我有三个数据框:

from pandas import DataFrame

df1 = DataFrame([
    [1],
    [3],
    [4]
],
    index=[1, 3, 4],
    columns=['value1']
)

df2 = DataFrame([
    [5],
    [6],
    [7],
],
    index=[5, 6, 7],
    columns=['value2']
)

df3 = DataFrame([
    [5, 9],
    [6, 10],
    [7, 11],
    [8, 12]
],
    index=[5, 6, 7, 8],
    columns=['value1', 'value2']
)

使用

concat([df1, df2, df3], sort=True, axis=1)

现在会给我

   value1  value2  value1  value2
1     1.0     NaN     NaN     NaN
3     3.0     NaN     NaN     NaN
4     4.0     NaN     NaN     NaN
5     NaN     5.0     5.0     9.0
6     NaN     6.0     6.0    10.0
7     NaN     7.0     7.0    11.0
8     NaN     NaN     8.0    12.0

现在,我怎样才能得到结果

   value1  value2
1     1.0     NaN
3     3.0     NaN
4     4.0     NaN
5     5.0     5.0
6     5.0     6.0
7     7.0     7.0
8     8.0     12.0

换句话说,对于同名的列,如何将它们“向左”合并?我正在寻找一个通用的解决方案,它可以接受任意数量的具有相同名称的多个列(以及只出现一次的列名)。

标签: pythonpandas

解决方案


使用DataFrame.combine_first

df = df1.combine_first(df2).combine_first(df3)
print (df)
   value1  value2
1     1.0     NaN
3     3.0     NaN
4     4.0     NaN
5     5.0     5.0
6     6.0     6.0
7     7.0     7.0
8     8.0    12.0

更通用的解决方案list of DataFrames是使用reduce

from functools import reduce

dfs = [df1, df2, df3]
df = reduce(lambda l,r: pd.DataFrame.combine_first(l,r), dfs)

推荐阅读