首页 > 解决方案 > 你如何合并不同形状的熊猫数据框?

问题描述

我正在尝试将 pandas 中的两个数据框与大量数据合并,但这给我带来了一些问题。我将尝试用一个更小的例子来说明。

df1 有一个设备列表和几个与设备相关的列:

Item ID Equipment     Owner Status   Location
1       Jackhammer    James Active   London
2       Cement Mixer  Tim   Active   New York
3       Drill         Sarah Active   Paris
4       Ladder        Luke  Inactive Hong Kong
5       Winch         Kojo  Inactive Sydney
6       Circular Saw  Alex  Active   Moscow

df2 有一个使用设备的实例列表。这与 df1 有一些类似的列,但是一些字段是 NaN 值,并且还记录了不在 df1 中的设备实例:

Item ID Equipment     Owner Date       Location
1       Jackhammer    James 08/09/2020 London
1       Jackhammer    James 08/10/2020 London
2       Cement Mixer  NaN   29/02/2020 New York
3       Drill         Sarah 11/02/2020 NaN
3       Drill         Sarah 30/11/2020 NaN
3       Drill         Sarah 21/12/2020 NaN
6       Circular Saw  Alex  19/06/2020 Moscow
7       Hammer        Ken   21/12/2020 Toronto
8       Sander        Ezra  19/06/2020 Frankfurt

我希望最终得到的数据框是这样的:

Item ID Equipment     Owner Status   Date       Location
1       Jackhammer    James Active   08/09/2020 London
1       Jackhammer    James Active   08/10/2020 London
2       Cement Mixer  Tim   Active   29/02/2020 New York
3       Drill         Sarah Active   11/02/2020 Paris
3       Drill         Sarah Active   30/11/2020 Paris
3       Drill         Sarah Active   21/12/2020 Paris
4       Ladder        Luke  Inactive NaN        Hong Kong
5       Winch         Kojo  Inactive NaN        Sydney
6       Circular Saw  Alex  Active   19/06/2020 Moscow
7       Hammer        Ken   NaN      21/12/2020 Toronto
8       Sander        Ezra  NaN      19/06/2020 Frankfurt

相反,使用以下代码我得到重复的行,我认为是因为 NaN 值:

data = pd.merge(df1, df2, how='outer', on=['Item ID'])

Item ID Equipment_x  Equipment_y Owner_x Owner_y Status   Date       Location_x  Location_y
1       Jackhammer   NaN         James   James   Active   08/09/2020 London      London
1       Jackhammer   NaN         James   James   Active   08/10/2020 London      London
2       Cement Mixer NaN         Tim     NaN     Active   29/02/2020 New York    New York
3       Drill        NaN         Sarah   Sarah   Active   11/02/2020 Paris       NaN
3       Drill        NaN         Sarah   Sarah   Active   30/11/2020 Paris       NaN
3       Drill        NaN         Sarah   Sarah   Active   21/12/2020 Paris       NaN
4       Ladder       NaN         Luke    NaN     Inactive NaN        Hong Kong   Hong Kong
5       Winch        NaN         Kojo    NaN     Inactive NaN        Sydney      Sydney
6       Circular Saw NaN         Alex    NaN     Active   19/06/2020 Moscow      Moscow
7       NaN          Hammer      NaN     Ken     NaN      21/12/2020 NaN         Toronto
8       NaN          Sander      NaN     Ezra    NaN      19/06/2020 NaN         Frankfurt

理想情况下,我可以只删除 _y 列,但是底部行中的数据意味着我会丢失重要信息。相反,我唯一能想到的就是合并列并强制 pandas 比较每列中的值并始终支持非 NaN 值。我不确定这是否可能?

标签: pythonpandasdataframemerge

解决方案


合并列并强制 pandas 比较每列中的值并始终支持非 NaN 值。

你是这个意思吗?

In [45]: data = pd.merge(df1, df2, how='outer', on=['Item ID', 'Equipment'])                         

In [46]: data['Location'] = data['Location_y'].fillna(data['Location_x'])                            

In [47]: data['Owner'] = data['Owner_y'].fillna(data['Owner_x'])                                     

In [48]: data = data.drop(['Location_x', 'Location_y', 'Owner_x', 'Owner_y'], axis=1)                

In [49]: data                                                                                        
Out[49]: 
    Item ID     Equipment    Status        Date   Location  Owner
0         1    Jackhammer    Active  08/09/2020     London  James
1         1    Jackhammer    Active  08/10/2020     London  James
2         2  Cement Mixer    Active  29/02/2020   New York    Tim
3         3         Drill    Active  11/02/2020      Paris  Sarah
4         3         Drill    Active  30/11/2020      Paris  Sarah
5         3         Drill    Active  21/12/2020      Paris  Sarah
6         4        Ladder  Inactive         NaN  Hong Kong   Luke
7         5         Winch  Inactive         NaN     Sydney   Kojo
8         6  Circular Saw    Active  19/06/2020     Moscow   Alex
9         7        Hammer       NaN  21/12/2020    Toronto    Ken
10        8        Sander       NaN  19/06/2020  Frankfurt   Ezra

(据我所知)你不能真正合并到空列上。但是,您可以使用fillna该值并将其替换为其他值(如果是)NaN。不是一个非常优雅的解决方案,但它似乎至少可以解决您的示例。

另请参阅pandas 将两列与空值组合


推荐阅读