python - 计算名词和动词/形容词之间的共现
问题描述
我有一个包含评论的数据框,以及两个列表,一个存储名词,另一个存储动词/形容词。
示例代码:
import pandas as pd
data = {'reviews':['Very professional operation. Room is very clean and comfortable',
'Daniel is the most amazing host! His place is extremely clean, and he provides everything you could possibly want (comfy bed, guidebooks & maps, mini-fridge, towels, even toiletries). He is extremely friendly and helpful.',
'The room is very quiet, and well decorated, very clean.',
'He provides the room with towels, tea, coffee and a wardrobe.',
'Daniel is a great host. Always recomendable.',
'My friend and I were very satisfied with our stay in his apartment.']}
df = pd.DataFrame(data)
nouns = ['place','Amsterdam','apartment','location','host','stay','city','room','everything','time','house',
'area','home','’','center','restaurants','centre','Great','tram','très','minutes','walk','space','neighborhood',
'à','station','bed','experience','hosts','Thank','bien']
verbs_adj = ['was','is','great','nice','had','clean','were','recommend','stay','are','good','perfect','comfortable',
'have','easy','be','quiet','helpful','get','beautiful',"'s",'has','est','located','un','amazing','wonderful',]
使用数据框和两个列表,我如何创建一个函数,该函数返回每个评论中名词的动词和形容词共现字典的字典?我理想的输出是:
示例评论:“一家大餐厅在大菜中提供美味佳肴”
>>> {‘restaurant’: {‘big’: 2, ‘served’:1, ‘delicious’:1}}
解决方案
你可以试试这个:
from collections import Counter
from copy import deepcopy
from pprint import pprint
data = ...
nouns = ...
verbs_adj = ...
def count_co_occurences(reviews):
# Iterate on each review and count
occurences_per_review = {
f"review_{i+1}": {
noun: dict(Counter(review.lower().split(" ")))
for noun in nouns
if noun in review.lower()
}
for i, review in enumerate(reviews)
}
# Remove verb_adj not found in main list
opr = deepcopy(occurences_per_review)
for review, occurences in opr.items():
for noun, counts in occurences.items():
for verb_adj in counts.keys():
if verb_adj not in verbs_adj:
del occurences_per_review[review][noun][verb_adj]
return occurences_per_review
pprint(count_co_occurences(data["reviews"]))
# Outputs
{'review_1': {'room': {'clean': 1, 'comfortable': 1, 'is': 1}},
'review_2': {'bed': {'amazing': 1, 'is': 3},
'everything': {'amazing': 1, 'is': 3},
'host': {'amazing': 1, 'is': 3},
'place': {'amazing': 1, 'is': 3}},
'review_3': {'room': {'is': 1}},
'review_4': {'room': {}},
'review_5': {'host': {'great': 1, 'is': 1}},
'review_6': {'apartment': {'stay': 1, 'were': 1},
'stay': {'stay': 1, 'were': 1}}}
推荐阅读
- sqlite - 我如何动态旋转sqlite android
- spring-boot - 创建连接工厂数据源以在 JdbcTemplate springboot 中使用
- angularjs - 带有输入干扰枚举选择的 ngModel
- docker - 与非技术人员共享 Jupyter Notebook/Lab 输出
- windows - 如何将字符串发送到外部文本编辑器?
- api - youtube API nextPageToken 用于谷歌表格中的评论:重复标记或未定义
- c# - 是否可以使用 appium 从 samba 文件夹远程安装 .ipa?
- java - 返回 false 时的递归
- r - 如何根据变量值过滤 R 中的数据
- javascript - 使用 mongodb 时如何将错误传递给节点中的父函数