首页 > 解决方案 > 计算名词和动词/形容词之间的共现

问题描述

我有一个包含评论的数据框,以及两个列表,一个存储名词,另一个存储动词/形容词。

示例代码:

import pandas as pd

data = {'reviews':['Very professional operation. Room is very clean and comfortable',
                    'Daniel is the most amazing host! His place is extremely clean, and he provides everything you could possibly want (comfy bed, guidebooks & maps, mini-fridge, towels, even toiletries). He is extremely friendly and helpful.',
                    'The room is very quiet, and well decorated, very clean.',
                    'He provides the room with towels, tea, coffee and a wardrobe.',
                    'Daniel is a great host. Always recomendable.',
                    'My friend and I were very satisfied with our stay in his apartment.']}

df = pd.DataFrame(data)
nouns = ['place','Amsterdam','apartment','location','host','stay','city','room','everything','time','house',
         'area','home','’','center','restaurants','centre','Great','tram','très','minutes','walk','space','neighborhood',
         'à','station','bed','experience','hosts','Thank','bien']

verbs_adj = ['was','is','great','nice','had','clean','were','recommend','stay','are','good','perfect','comfortable',
             'have','easy','be','quiet','helpful','get','beautiful',"'s",'has','est','located','un','amazing','wonderful',]

使用数据框和两个列表,我如何创建一个函数,该函数返回每个评论中名词的动词和形容词共现字典的字典?我理想的输出是:

示例评论:“一家大餐厅在大菜中提供美味佳肴”

>>> {‘restaurant’: {‘big’: 2, ‘served’:1, ‘delicious’:1}}

标签: pythonpandasnltk

解决方案


你可以试试这个:

from collections import Counter
from copy import deepcopy
from pprint import pprint

data = ...
nouns = ...
verbs_adj = ...

def count_co_occurences(reviews):
    # Iterate on each review and count
    occurences_per_review = {
        f"review_{i+1}": {
            noun: dict(Counter(review.lower().split(" ")))
            for noun in nouns
            if noun in review.lower()
        }
        for i, review in enumerate(reviews)
    }
    # Remove verb_adj not found in main list
    opr = deepcopy(occurences_per_review)
    for review, occurences in opr.items():
        for noun, counts in occurences.items():
            for verb_adj in counts.keys():
                if verb_adj not in verbs_adj:
                    del occurences_per_review[review][noun][verb_adj]
    return occurences_per_review


pprint(count_co_occurences(data["reviews"]))
# Outputs
{'review_1': {'room': {'clean': 1, 'comfortable': 1, 'is': 1}},
 'review_2': {'bed': {'amazing': 1, 'is': 3},       
              'everything': {'amazing': 1, 'is': 3},
              'host': {'amazing': 1, 'is': 3},      
              'place': {'amazing': 1, 'is': 3}},    
 'review_3': {'room': {'is': 1}},
 'review_4': {'room': {}},
 'review_5': {'host': {'great': 1, 'is': 1}},       
 'review_6': {'apartment': {'stay': 1, 'were': 1},  
              'stay': {'stay': 1, 'were': 1}}} 


推荐阅读