首页 > 解决方案 > Python NLTK:如何找到用户输入和 Excel 数据之间的相似性

问题描述

所以我正在尝试创建一个 python 聊天机器人,我有一个包含数百行的 excel 文件,如下所示:

QuestionID     Question               Answer        Document 
Q1             Where is London?       In the UK     Google
Q2             How many football      22            Google
               players on the pitch?    

现在,当用户输入一个问题时,例如“伦敦在哪里?” 或“伦敦在哪里”我希望它返回该行中的所有文本。

我可以成功打印 excel 文件中的内容,但我不确定如何遍历所有行并找到与用户问题相似或匹配的行。

text = []
    
with open("dataset.csv") as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        text.append((row['Question'], row['Answer'], row['Document'] ))
    
print(text)

标签: pythonnltk

解决方案


您可能不想进行完全匹配,因为这意味着它将区分大小写,并且需要精确的标点符号,没有拼写错误等。

我会考虑使用fuzzywuzzy来查找匹配分数。然后您可以返回与问题最匹配的解决方案:

例子:

from fuzzywuzzy import fuzz
import pandas as pd

lookup_table = pd.DataFrame({
        'QuestionID':['Q1','Q2','Q3'],
        'Question':['Where is London?','Where is Rome?', 'How many football players on the pitch?'],
        'Answer':['In the UK','In Italy', 22],
        'Document':['Google','Google','Google']})



question = 'how many players on a football pitch?'

lookup_table['score'] = lookup_table.apply(lambda x : fuzz.ratio(x.Question, question), axis=1)
lookup_table = lookup_table.sort_values('score', ascending=False)

结果表:

print (lookup_table.to_string())

  QuestionID                                 Question     Answer Document  score
2         Q3  How many football players on the pitch?         22   Google     71
0         Q1                         Where is London?  In the UK   Google     34
1         Q2                           Where is Rome?   In Italy   Google     27

给出最佳选择的答案:

print (lookup_table.iloc[0]['Answer'])
22

或者因为您想要该行

print (lookup_table.head(1))
  QuestionID                                 Question Answer Document  score
2         Q3  How many football players on the pitch?     22   Google     71

推荐阅读