python - 你如何合并不同形状的熊猫数据框?
问题描述
我正在尝试将 pandas 中的两个数据框与大量数据合并,但这给我带来了一些问题。我将尝试用一个更小的例子来说明。
df1 有一个设备列表和几个与设备相关的列:
Item ID Equipment Owner Status Location
1 Jackhammer James Active London
2 Cement Mixer Tim Active New York
3 Drill Sarah Active Paris
4 Ladder Luke Inactive Hong Kong
5 Winch Kojo Inactive Sydney
6 Circular Saw Alex Active Moscow
df2 有一个使用设备的实例列表。这与 df1 有一些类似的列,但是一些字段是 NaN 值,并且还记录了不在 df1 中的设备实例:
Item ID Equipment Owner Date Location
1 Jackhammer James 08/09/2020 London
1 Jackhammer James 08/10/2020 London
2 Cement Mixer NaN 29/02/2020 New York
3 Drill Sarah 11/02/2020 NaN
3 Drill Sarah 30/11/2020 NaN
3 Drill Sarah 21/12/2020 NaN
6 Circular Saw Alex 19/06/2020 Moscow
7 Hammer Ken 21/12/2020 Toronto
8 Sander Ezra 19/06/2020 Frankfurt
我希望最终得到的数据框是这样的:
Item ID Equipment Owner Status Date Location
1 Jackhammer James Active 08/09/2020 London
1 Jackhammer James Active 08/10/2020 London
2 Cement Mixer Tim Active 29/02/2020 New York
3 Drill Sarah Active 11/02/2020 Paris
3 Drill Sarah Active 30/11/2020 Paris
3 Drill Sarah Active 21/12/2020 Paris
4 Ladder Luke Inactive NaN Hong Kong
5 Winch Kojo Inactive NaN Sydney
6 Circular Saw Alex Active 19/06/2020 Moscow
7 Hammer Ken NaN 21/12/2020 Toronto
8 Sander Ezra NaN 19/06/2020 Frankfurt
相反,使用以下代码我得到重复的行,我认为是因为 NaN 值:
data = pd.merge(df1, df2, how='outer', on=['Item ID'])
Item ID Equipment_x Equipment_y Owner_x Owner_y Status Date Location_x Location_y
1 Jackhammer NaN James James Active 08/09/2020 London London
1 Jackhammer NaN James James Active 08/10/2020 London London
2 Cement Mixer NaN Tim NaN Active 29/02/2020 New York New York
3 Drill NaN Sarah Sarah Active 11/02/2020 Paris NaN
3 Drill NaN Sarah Sarah Active 30/11/2020 Paris NaN
3 Drill NaN Sarah Sarah Active 21/12/2020 Paris NaN
4 Ladder NaN Luke NaN Inactive NaN Hong Kong Hong Kong
5 Winch NaN Kojo NaN Inactive NaN Sydney Sydney
6 Circular Saw NaN Alex NaN Active 19/06/2020 Moscow Moscow
7 NaN Hammer NaN Ken NaN 21/12/2020 NaN Toronto
8 NaN Sander NaN Ezra NaN 19/06/2020 NaN Frankfurt
理想情况下,我可以只删除 _y 列,但是底部行中的数据意味着我会丢失重要信息。相反,我唯一能想到的就是合并列并强制 pandas 比较每列中的值并始终支持非 NaN 值。我不确定这是否可能?
解决方案
合并列并强制 pandas 比较每列中的值并始终支持非 NaN 值。
你是这个意思吗?
In [45]: data = pd.merge(df1, df2, how='outer', on=['Item ID', 'Equipment'])
In [46]: data['Location'] = data['Location_y'].fillna(data['Location_x'])
In [47]: data['Owner'] = data['Owner_y'].fillna(data['Owner_x'])
In [48]: data = data.drop(['Location_x', 'Location_y', 'Owner_x', 'Owner_y'], axis=1)
In [49]: data
Out[49]:
Item ID Equipment Status Date Location Owner
0 1 Jackhammer Active 08/09/2020 London James
1 1 Jackhammer Active 08/10/2020 London James
2 2 Cement Mixer Active 29/02/2020 New York Tim
3 3 Drill Active 11/02/2020 Paris Sarah
4 3 Drill Active 30/11/2020 Paris Sarah
5 3 Drill Active 21/12/2020 Paris Sarah
6 4 Ladder Inactive NaN Hong Kong Luke
7 5 Winch Inactive NaN Sydney Kojo
8 6 Circular Saw Active 19/06/2020 Moscow Alex
9 7 Hammer NaN 21/12/2020 Toronto Ken
10 8 Sander NaN 19/06/2020 Frankfurt Ezra
(据我所知)你不能真正合并到空列上。但是,您可以使用fillna
该值并将其替换为其他值(如果是)NaN
。不是一个非常优雅的解决方案,但它似乎至少可以解决您的示例。
另请参阅pandas 将两列与空值组合
推荐阅读
- java - Tomcat 上的 DBCP 连接池设置
- php - 如果大于 85 不起作用,Laravel 会浮动。它总是显示小于 85
- android-architecture-components - 调用 Observer.onChanged() 并不意味着内容实际上变得不同
- cordova - iOS平台ionic-3中的离子输入问题
- django - django-rest-framework 在使用外键访问另一个模型时在模型中引发错误
- java - TreeMap 和 SortedMap 转换语句
- javascript - 休息api jqgrid
- docker - 无法安装asp.net核心开发证书工具(dev-certs)
- php - 如何使用php从键值对中的多维数组中回显值
- spring - 如何使用 Spring Boot WebClient 收集分页 API 响应?