python - Python NLTK:如何找到用户输入和 Excel 数据之间的相似性
问题描述
所以我正在尝试创建一个 python 聊天机器人,我有一个包含数百行的 excel 文件,如下所示:
QuestionID Question Answer Document
Q1 Where is London? In the UK Google
Q2 How many football 22 Google
players on the pitch?
现在,当用户输入一个问题时,例如“伦敦在哪里?” 或“伦敦在哪里”我希望它返回该行中的所有文本。
我可以成功打印 excel 文件中的内容,但我不确定如何遍历所有行并找到与用户问题相似或匹配的行。
text = []
with open("dataset.csv") as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
text.append((row['Question'], row['Answer'], row['Document'] ))
print(text)
解决方案
您可能不想进行完全匹配,因为这意味着它将区分大小写,并且需要精确的标点符号,没有拼写错误等。
我会考虑使用fuzzywuzzy
来查找匹配分数。然后您可以返回与问题最匹配的解决方案:
例子:
from fuzzywuzzy import fuzz
import pandas as pd
lookup_table = pd.DataFrame({
'QuestionID':['Q1','Q2','Q3'],
'Question':['Where is London?','Where is Rome?', 'How many football players on the pitch?'],
'Answer':['In the UK','In Italy', 22],
'Document':['Google','Google','Google']})
question = 'how many players on a football pitch?'
lookup_table['score'] = lookup_table.apply(lambda x : fuzz.ratio(x.Question, question), axis=1)
lookup_table = lookup_table.sort_values('score', ascending=False)
结果表:
print (lookup_table.to_string())
QuestionID Question Answer Document score
2 Q3 How many football players on the pitch? 22 Google 71
0 Q1 Where is London? In the UK Google 34
1 Q2 Where is Rome? In Italy Google 27
给出最佳选择的答案:
print (lookup_table.iloc[0]['Answer'])
22
或者因为您想要该行
print (lookup_table.head(1))
QuestionID Question Answer Document score
2 Q3 How many football players on the pitch? 22 Google 71
推荐阅读
- python - 如何使用基于日期列的预测
- php - 获取作者在空间分类中的帖子
- apache - .htaccess 将查询字符串重写为路径 url
- android - 房间数据库不更新值
- build - 工艺,关于建设 Kdenlive
- javascript - Lodash 用条件合并两个数组
- python - 简单的列表程序不起作用 - 列表索引超出范围错误
- python - 未知标签类型:“连续”sklearn LogisticRegression
- java - 通过 Java 应用程序创建的 Csv 文件中的可点击电子邮件链接
- php - 在 Laravel Dom PDF 视图中,Table>td 内容显示不正确。如何将剩余内容放到底部或下一页?