首页 > 解决方案 > 如何根据其中一列的子字符串合并 Pandas 数据框?

问题描述

我有 2 个数据框:df1 和 df2

df1
                  School Conference
0              Air Force   Mt. West
1                  Akron        MAC
2  Alabama at Birmingham      C-USA
3                 Auburn   Sun Belt

df2
                           SCHOOL_NAME           RATE
0                    Auburn University           93.0
1                    Air Force Academy           53.0
2                           Birmingham           75.0
3                  University of Akron           77.0

I would like to get the output below, basically binding the `RATE` column from df2 into df1 based on substring from School column
                  School Conference  RATE
0              Air Force   Mt. West  53.0
1                  Akron        MAC  77.0
2  Alabama at Birmingham      C-USA  75.0
3                 Auburn   Sun Belt  93.0

我尝试了下面的代码,但它不起作用。当我运行它时,它似乎执行成功,但没有任何反应

for i in range(1, len(df1)):
    if df1['School'][i] in df2['SCHOOL_NAME']:
       pd.merge(df1, df2, how = 'left', left_on = 'School', right_on = 'SCHOOL_NAME')

标签: pythonpandasdataframe

解决方案


您可以使用列表推导来检查每个数据框的列是否in彼此(您也可以不区分大小写进行比较),然后合并:

df1['SCHOOL_NAME'] = df1['School'].apply(lambda x: [y for y in df2['SCHOOL_NAME']
                                                    if x in y or y in x]).str[0]
df1 = df1.merge(df2, how='left').drop('SCHOOL_NAME', axis=1) #can pass on='SCHOOL_NAME' to merge.
df1
Out[1]: 
                  School Conference  RATE
0              Air Force   Mt. West  53.0
1                  Akron        MAC  77.0
2  Alabama at Birmingham      C-USA  75.0
3                 Auburn   Sun Belt  93.0

您还可以通过添加和.lower()来不区分大小写地搜索:xy

df1['SCHOOL_NAME'] = df1['School'].apply(lambda x: [y for y in df2['SCHOOL_NAME']
                                                    if x.lower() in y.lower()
                                                    or y.lower() in x.lower()]).str[0]
df1 = df1.merge(df2, how='left').drop('SCHOOL_NAME', axis=1) #can pass on='SCHOOL_NAME' to merge.
df1
Out[2]:
                  School Conference  RATE
0              Air Force   Mt. West  53.0
1                  Akron        MAC  77.0
2  Alabama at Birmingham      C-USA  75.0
3                 Auburn   Sun Belt  93.0

每条评论一行代码:

df1 = (df1.assign(SCHOOL_NAME = df1['School'].apply(lambda x: [y for y in df2['SCHOOL_NAME']
                                                    if x.lower() in y.lower()
                                                    or y.lower() in x.lower()]).str[0])
          .merge(df2, how='left').drop('SCHOOL_NAME', axis=1))
df1
Out[3]: 
                  School Conference  RATE
0              Air Force   Mt. West  53.0
1                  Akron        MAC  77.0
2  Alabama at Birmingham      C-USA  75.0
3                 Auburn   Sun Belt  93.0

推荐阅读