python - 如何计算每个标记词的距离并返回列中距离为 0 的计数
问题描述
我有两个描述,一个在数据框中,另一个是单词列表,我需要计算描述中每个单词与列表中每个单词的 levensthein 距离,并返回 levensthein 距离结果的计数,即等于 0
import pandas as pd
definitions=['very','similarity','seem','scott','hello','names']
# initialize list of lists
data = [['hello my name is Scott'], ['I went to the mall yesterday'], ['This seems very similar']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Descriptions'])
# print dataframe.
df
列计算每行中所有单词的数量,计算字典中每个单词的 Lev 距离返回 0
df['lev_count_0']= 列计算每行中所有单词的数量,计算字典中每个单词的 Lev 距离返回 0
例如,第一种情况是
edit_distance("hello","very") # This will be equal to 4
edit_distance("hello","similarity") # this will be equal to 9
edit_distance("hello","seem") # This will be equal to 4
edit_distance("hello","scott") # This will be equal to 5
edit_distance("hello","hello")# This will be equal to 0
edit_distance("hello","names") # this will be equal to 5
因此,对于 df['lev_count_0'] 中的第一行,结果应该是 1,因为只有一个 0 将描述中的所有单词与定义列表进行比较
Description | lev_count_0
hello my name is Scott | 1
解决方案
我的解决方案
from nltk import edit_distance
import pandas as pd
data = [['hello my name is Scott'], ['I went to the mall yesterday'], ['This seems very similar']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Descriptions'])
dictionary=['Hello', 'my']
def lev_dist(colum):
count=0
dataset=list(colum.split(" "))
for word in dataset :
for dic in dictionary:
result=edit_distance(word,dic)
if result ==0 :
count=count+1
return count
df['count_lev_0'] = df.Descriptions.apply(lev_dist)
推荐阅读
- c# - 用户设备认证离线
- python-3.x - python的范围函数的实际范围是多少
- laravel - 如何从两个以上的表 laravel 中获取价值?
- javascript - 一个标记与两端定义的折线之间的最近距离
- java - toInstant() java 返回错误值
- react-hooks - 从 api 异步方式获取数据时,反应功能组件被调用 3 次
- amazon-web-services - 使用 Codebuild 时的 S3 策略
- github - circle ci 工作流程何时运行?
- python - 如何使用 Beautiful soup python 将 Div 中的所有详细信息导出到 excel/csv?
- python - scrapy.http.request 不执行来自 Scrapy 的回调