首页 > 解决方案 > 基于两列合并两个数据框

问题描述

我有两个数据框:df1df_out. df1看起来像这样

   phone_number1  phone_number2
1    123             123
2    345              0
3    678             678
4    912             912
5    555             666

df_out

    phone_number  address   name
1      123         add1     name1
2     777777       add2     name2
3      666         add3     name3
4      555         add4     name4

并想做这样的事情

    phone_number  address   name    df1_phone_number1    df1_phone_number2
1      123         add1     name1        123                  123
2     777777       add2     name2        None                 None
3      666         add3     name3        None                 666
4      555         add4     name4        555                  None

我试过这段代码,但它不会保存匹配的数据phone_number2

    mergedStuff3 = pd.concat([
            pd.merge(df1, df_out,  how='right', left_on=['phone_number1'], right_on = ['phone_number']),
            pd.merge(df1, df_out,  how='right', left_on=['phone_number2', 'phone_number1'], right_on = ['phone_number', 'phone_number'])

        ])

标签: pythonpython-3.xpandasdataframe

解决方案


我认为您需要Series.where通过以下方式检查会员资格Series.isin

m1 = df_out['phone_number'].isin(df1['phone_number1'])
m2 = df_out['phone_number'].isin(df1['phone_number2'])
df_out['df1_phone_number1'] = df_out['phone_number'].where(m1)
df_out['df1_phone_number2'] = df_out['phone_number'].where(m2)
print (df_out)
   phone_number address   name  df1_phone_number1  df1_phone_number2
1           123    add1  name1              123.0              123.0
2        777777    add2  name2                NaN                NaN
3           666    add3  name3                NaN              666.0
4           555    add4  name4              555.0                NaN

或使用numpy.where

m1 = df_out['phone_number'].isin(df1['phone_number1'])
m2 = df_out['phone_number'].isin(df1['phone_number2'])
df_out['df1_phone_number1'] = np.where(m1, df_out['phone_number'], None)
df_out['df1_phone_number2'] = np.where(m2, df_out['phone_number'], None)
print (df_out)
   phone_number address   name df1_phone_number1 df1_phone_number2
1           123    add1  name1               123               123
2        777777    add2  name2              None              None
3           666    add3  name3              None               666
4           555    add4  name4               555              None

如果有更多列解决方案应更改:

for c in df1.columns:
    m = df_out['phone_number'].isin(df1[c])
    df_out[f'df1_{c}'] = np.where(m, df_out['phone_number'], None)

推荐阅读