首页 > 解决方案 > 列表中的短语相似度

问题描述

嗨,假设我有 2 个列表:

names = ['Daniel', 'Mario', 'Mandy', 'Jolene', 'Fabio']
places = ['on top of the table', 'France', 'valley of the kings']

和一个带有一些句子的数据框,例如:

数据框

Index | Sent
0     | Mandy went to France on the Eiffel Tower
1     | Daniele was dancing on top of the box
2     | I am eating on top of the table
3     | Maria went to the valley of the kings

我想使用像 difflib 这样的距离度量来扫描句子并将短语与具有确定偏移量的列表进行比较。希望这样做的结果是:

Index | Sent                                     | Result
0     | Mandy went to France on the Eiffel Tower | Mandy
1     | Daniele was dancing on top of the box    | Daniel
2     | I am eating on top of the table          | on top of the table
3     | Maria went to the valley of the kings    | Mario, valley of the kings

如果不使用大量循环来获取短语匹配,您将如何处理它?

标签: pythonlistdataframesimilaritydifflib

解决方案


推荐阅读