首页 > 解决方案 > 无法生成列表以显示列表中的任何匹配项

问题描述

我正在尝试使数据框的一列与列表匹配(如果有)。为此创建了一个名为 return hits 的自定义函数。

def returnhits(a_list, long_string):
    matches =[]
    for match in a_list:
        if any(word in long_string.split() for match in a_list):
            matches.append(match)
    return ' , '.join(matches)
qualification_list = ('Professional Certificate', 'NiTEC ', "Bachelor's Degree", 'Diploma', 'Advanced/Higher/Graduate Diploma', 'Post Graduate Diploma' , 'Professional Degree', "Master's Degree" , 'Doctorate (PhD)')

但是我无法产生预期的结果。

df['Qualifications'] = df['Other information'].apply(lambda x : returnhits(qualification_list, x))

理想情况下,如果文本中有匹配项,它将返回 NiTEC ,Professional Degree

标签: pythonpandasdataframelambdaapply

解决方案


不要为此使用循环,使用pandas正则表达式方法:

import re

df = pd.DataFrame({'Other information': ['something', ' blah blah NiTEC', 'other diploma']})
qualification_list = ('Professional Certificate', 'NiTEC', "Bachelor's Degree", 'Diploma', 'Advanced/Higher/Graduate Diploma', 'Post Graduate Diploma' , 'Professional Degree', "Master's Degree" , 'Doctorate (PhD)')

df['Qualifications'] = df['Other information'].str.extract('(%s)' % '|'.join(re.escape(s) for s in qualification_list), flags=re.IGNORECASE)
df

输出:

  Other information Qualifications
0         something            NaN
1   blah blah NiTEC          NiTEC
2     other diploma        diploma

推荐阅读