首页 > 解决方案 > 如何计算 2 个 csv 文件中的匹配词

问题描述

我有 2 个 csv 文件,dictionary.csv 和 story.csv。我想计算每行story.csv中有多少单词与dictionary.csv中的单词匹配

以下是截断的示例

Story.csv 
id    STORY
0     Jennie have 2 shoes, a red heels and a blue sneakers
1     The skies are pretty today
2     One of aesthetic color is grey
Dictionary.csv
red
green
grey
blue
black

我期望的输出是

output.csv
id    STORY                                                  Found
0     Jennie have 2 shoes, a red heels and a blue sneakers    2
1     The skies are pretty today                              0
2     One of aesthetic color is grey                          1

这些是我到目前为止的代码,但我只有 NaN(空单元格)

import pandas as pd 
import csv

news=pd.read_csv("Story.csv") 
dictionary=pd.read_csv("Dictionary.csv")


news['STORY'].value_counts()

news['How many found in 1'] = dictionary['Lists'].map(news['STORY'].value_counts())

news.to_csv("output.csv")

我也尝试使用 .str.count ,但我一直在获得零

标签: python-3.xpandascsv

解决方案


尝试这个

import pandas as pd

#create the sample data frame
data = {'id':[0,1,2],'STORY':['Jennie have 2 shoes, a red heels and a blue sneakers',\
'The skies are pretty today',\
'One of aesthetic color is grey']}

word_list = ['red', 'green', 'grey', 'blue', 'black']

df = pd.DataFrame(data)

#start counting
df['Found'] = df['STORY'].astype(str).apply(lambda t: pd.Series({word: t.count(word) for word in word_list}).sum())
#alternatively, can use this
#df['Found'] = df['STORY'].astype(str).apply(lambda t: sum([t.count(word) for word in word_list]))

输出

df
#   id  STORY                                                Found
#0  0   Jennie have 2 shoes, a red heels and a blue sneakers 2
#1  1   The skies are pretty today                           0
#2  2   One of aesthetic color is grey                       1

奖励编辑:如果您想按单词查看字数的详细细分,请运行此

df['STORY'].astype(str).apply(lambda t: pd.Series({word: t.count(word) for word in word_list}))

#   red     green   grey    blue    black
#0  1       0       0       1       0
#1  0       0       0       0       0
#2  0       0       1       0       0


推荐阅读