首页 > 解决方案 > 有没有一种 Pythonic 方式来交互两个数据帧并比较它们的行?

问题描述

鉴于以下两个数据框:

df1 # 从 excel 电子表格中读取

data1 = {'ID':['1','2'],
         'Prod Family Desc':['Install','Maintenance'], 'Prod Family Code':['',''], 
         'Prod Type Desc':['Installation Serice','Maintenance Service'],'Prod Type Code':['',''],
        } 
df1 = pd.DataFrame(data1) 
print(df1)

结果df1:

  ID Prod Family Desc Prod Family Code       Prod Type Desc Prod Type Code
0  1          Install                   Installation Serice
1  2      Maintenance                   Maintenance Service

df2 # 是 SQL 查询的结果

data2 = {'Prod Class':['F','F','T','T'],
        'Prod Desc':['Install','Maintenance','Installation Serice','Maintenance Service'],'Prod Code':['2525','2534','H123','H321']
        }

df2 = pd.DataFrame(data2) 
print(df2)

结果df2:

  Prod Class            Prod Desc Prod Code
0          F              Install      2525
1          F          Maintenance      2534
2          T  Installation Serice      H123
3          T  Maintenance Service      H321

将df2 中的Prod Family CodeProd Type Code的分配给df1 上的Prod Family CodeProd Type Code的最佳方法是什么?

我正在这样做:

stype = df2.loc[df2['Prod Class'] == "T"] 

family = df2.loc[df2['Prod Class'] == "F"]

for i, concaterow in df1.iterrows():
    for j, styp in stype.iterrows():

        if (concaterow['Prod Type Desc'] == styp['Prod Desc']):
            df1.loc[i,'Prod Type Code'] = styp['Prod Code']

    for j, scat in family.iterrows():
        if (concaterow['Prod Family Desc'] == scat['Prod Desc']):
            df1.loc[i,'Prod Family Code'] = scat['Prod Code']

print(df1)

结果如预期:

  ID Prod Family Desc Prod Family Code       Prod Type Desc Prod Type Code
0  1          Install             2525  Installation Serice           H123
1  2      Maintenance             2534  Maintenance Service           H321

这种操作有什么pythonic方法吗?

#

**编辑@FatihAkici 问题的答案。

@FatihAkici - 由于 df2 是 SQL 查询的结果,我的预期结果是插入表中的最新值。因此,给定 df2 如下:

data2 = {'Prod Class':['F','F','F','T','T'], 'Prod Desc':['Install','Maintenance','Install','Installation Serice','Maintenance Service'],'Prod Code':['2525','2534','2536','H123','H321'] } ```

The expected result would be: 
```ID Prod Family Desc Prod Family Code Prod Type Desc Prod Type Code 
 0 1  Install          2536             Installation Serice H123 
 1 2 Maintenance       2534             Maintenance Service H321 

标签: pythonpandasdataframe

解决方案


您可以结合pd.DataFrame.assignpd.DataFrame.merge

df1.assign(**{
    "Prod Family Code" : df1.merge(df2, left_on = "Prod Family Desc", right_on = "Prod Desc")["Prod Code"],
    "Prod Type Code"   : df1.merge(df2, left_on = "Prod Type Desc", right_on = "Prod Desc")["Prod Code"]})

在您的示例中,您的数据框 df1 包含 2 个空列Prod Family CodeProd Type Code接收结果,但这不是此方法的要求


推荐阅读