首页 > 解决方案 > Pandas:基于公共列连接两个数据框的最佳方法

问题描述

我知道这是一个基本问题。但是,请听我说完。

我有以下数据框:

In [722]: m1
Out[722]: 
   Person_id  Evidence_14 Feature_14
0        100         90.0       True
1        101          NaN        NaN
2        102         91.0       True
3        103          NaN        NaN
4        104         94.0       True
5        105          NaN        NaN
6        106          NaN        NaN

In [721]: m3
Out[721]: 
   Person_id  Evidence_14 Feature_14
0        100          NaN        NaN
1        101         99.0      False
2        102          NaN        NaN
3        103         95.0      False
4        104          NaN        NaN
5        105          NaN        NaN
6        106         93.0      False

预期输出:

In [734]: z
Out[734]: 
   Person_id  Evidence_14 Feature_14
0        100         90.0       True
1        101         99.0      False
2        102         91.0       True
3        103         95.0      False
4        104         94.0       True
5        105          NaN        NaN
6        106         93.0      False

我能够像下面这样解决这个问题:

In [725]: z = m1.merge(m3, on='Person_id')
In [728]: z['Evidence_14'] = z.Evidence_14_x.combine_first(z.Evidence_14_y)
In [731]: z['Feature_14'] = z.Feature_14_x.combine_first(z.Feature_14_y)
In [733]: z.drop(['Evidence_14_x', 'Evidence_14_y', 'Feature_14_x', 'Feature_14_y'], 1, inplace=True)

In [734]: z
Out[734]: 
   Person_id  Evidence_14 Feature_14
0        100         90.0       True
1        101         99.0      False
2        102         91.0       True
3        103         95.0      False
4        104         94.0       True
5        105          NaN        NaN
6        106         93.0      False

但是,有没有更清洁/更好的方法来做到这一点?我错过了一些非常明显的东西吗?

标签: pythonpython-3.xpandasdataframe

解决方案


如果列名称匹配并且需要按Person_id值匹配,请使用:

m = m1.set_index('Person_id').combine_first(m2.set_index('Person_id')).reset_index()

如果两个 DataFrames 解决方案中的索引值相同并且也Person_id相同,则应通过匹配原始索引值来简化:

m = m1.combine_first(m2)

推荐阅读