首页 > 解决方案 > 通过 DataFrame 进行模糊匹配和迭代

问题描述

我有这两个数据框:我想将姓氏字符串与相应的名称进行模糊匹配

dico = {'Name': ['Arthur','Henri','Lisiane','Patrice'],
        "Age": ["20","18","62","73"],
        "Studies": ['Economics','Maths','Psychology','Medical']
             }
dico2 = {'Surname': ['Henri2','Arthur1','Patrice4','Lisiane3']}

dico = pd.DataFrame.from_dict(dico)
dico2 = pd.DataFrame.from_dict(dico2)

我想将姓氏字符串与相应的名称进行模糊匹配,以得到如下输出

      Name   Surname Age     Studies
0   Arthur   Arthur1  20   Economics
1    Henri    Henri2  18       Maths
2  Lisiane  Lisiane3  62  Psychology
3  Patrice  Patrice4  73     Medical

到目前为止,这是我的代码:

dico['Surname'] = []
for i in dico2:
    lst = [0, 0, 0]
    for j in dico:
        if lst[0] < fuzz.ratio(i,j):
            lst[0] = fuzz.ratio(i,j)
            lst[1] = i
            lst[2] = j
    dico['Surname'].append(i)

但我得到一个ValueError: Length of values (0) does not match length of index (4),我不明白为什么。谢谢 !

标签: pandasdictionaryiterationfuzzywuzzy

解决方案


dico = {'Name': ['Arthur','Henri','Lisiane','Patrice'],
        "Age": ["20","18","62","73"],
        "Studies": ['Economics','Maths','Psychology','Medical']
             }
dico2 = {'Surname': ['Henri2','Arthur1','Patrice4','Lisiane3']}

dico = pd.DataFrame.from_dict(dico)
dico2 = pd.DataFrame.from_dict(dico2)

temp = pd.DataFrame()

for x in range (0, len(dico.Name)):
    name_str = dico.Name[x]
    temp = pd.concat([temp, dico2[dico2.Surname.str.contains(name_str)].Surname])

temp.columns=['Surname']

temp = temp.reset_index(drop = True)

dico = pd.concat([dico, temp], axis=1)


推荐阅读