python - 如何根据其中一列的子字符串合并 Pandas 数据框?
问题描述
我有 2 个数据框:df1 和 df2
df1
School Conference
0 Air Force Mt. West
1 Akron MAC
2 Alabama at Birmingham C-USA
3 Auburn Sun Belt
df2
SCHOOL_NAME RATE
0 Auburn University 93.0
1 Air Force Academy 53.0
2 Birmingham 75.0
3 University of Akron 77.0
I would like to get the output below, basically binding the `RATE` column from df2 into df1 based on substring from School column
School Conference RATE
0 Air Force Mt. West 53.0
1 Akron MAC 77.0
2 Alabama at Birmingham C-USA 75.0
3 Auburn Sun Belt 93.0
我尝试了下面的代码,但它不起作用。当我运行它时,它似乎执行成功,但没有任何反应
for i in range(1, len(df1)):
if df1['School'][i] in df2['SCHOOL_NAME']:
pd.merge(df1, df2, how = 'left', left_on = 'School', right_on = 'SCHOOL_NAME')
解决方案
您可以使用列表推导来检查每个数据框的列是否in
彼此(您也可以不区分大小写进行比较),然后合并:
df1['SCHOOL_NAME'] = df1['School'].apply(lambda x: [y for y in df2['SCHOOL_NAME']
if x in y or y in x]).str[0]
df1 = df1.merge(df2, how='left').drop('SCHOOL_NAME', axis=1) #can pass on='SCHOOL_NAME' to merge.
df1
Out[1]:
School Conference RATE
0 Air Force Mt. West 53.0
1 Akron MAC 77.0
2 Alabama at Birmingham C-USA 75.0
3 Auburn Sun Belt 93.0
您还可以通过添加和.lower()
来不区分大小写地搜索:x
y
df1['SCHOOL_NAME'] = df1['School'].apply(lambda x: [y for y in df2['SCHOOL_NAME']
if x.lower() in y.lower()
or y.lower() in x.lower()]).str[0]
df1 = df1.merge(df2, how='left').drop('SCHOOL_NAME', axis=1) #can pass on='SCHOOL_NAME' to merge.
df1
Out[2]:
School Conference RATE
0 Air Force Mt. West 53.0
1 Akron MAC 77.0
2 Alabama at Birmingham C-USA 75.0
3 Auburn Sun Belt 93.0
每条评论一行代码:
df1 = (df1.assign(SCHOOL_NAME = df1['School'].apply(lambda x: [y for y in df2['SCHOOL_NAME']
if x.lower() in y.lower()
or y.lower() in x.lower()]).str[0])
.merge(df2, how='left').drop('SCHOOL_NAME', axis=1))
df1
Out[3]:
School Conference RATE
0 Air Force Mt. West 53.0
1 Akron MAC 77.0
2 Alabama at Birmingham C-USA 75.0
3 Auburn Sun Belt 93.0