python-3.x - 如何计算 2 个 csv 文件中的匹配词
问题描述
我有 2 个 csv 文件,dictionary.csv 和 story.csv。我想计算每行story.csv中有多少单词与dictionary.csv中的单词匹配
以下是截断的示例
Story.csv
id STORY
0 Jennie have 2 shoes, a red heels and a blue sneakers
1 The skies are pretty today
2 One of aesthetic color is grey
Dictionary.csv
red
green
grey
blue
black
我期望的输出是
output.csv
id STORY Found
0 Jennie have 2 shoes, a red heels and a blue sneakers 2
1 The skies are pretty today 0
2 One of aesthetic color is grey 1
这些是我到目前为止的代码,但我只有 NaN(空单元格)
import pandas as pd
import csv
news=pd.read_csv("Story.csv")
dictionary=pd.read_csv("Dictionary.csv")
news['STORY'].value_counts()
news['How many found in 1'] = dictionary['Lists'].map(news['STORY'].value_counts())
news.to_csv("output.csv")
我也尝试使用 .str.count ,但我一直在获得零
解决方案
尝试这个
import pandas as pd
#create the sample data frame
data = {'id':[0,1,2],'STORY':['Jennie have 2 shoes, a red heels and a blue sneakers',\
'The skies are pretty today',\
'One of aesthetic color is grey']}
word_list = ['red', 'green', 'grey', 'blue', 'black']
df = pd.DataFrame(data)
#start counting
df['Found'] = df['STORY'].astype(str).apply(lambda t: pd.Series({word: t.count(word) for word in word_list}).sum())
#alternatively, can use this
#df['Found'] = df['STORY'].astype(str).apply(lambda t: sum([t.count(word) for word in word_list]))
输出
df
# id STORY Found
#0 0 Jennie have 2 shoes, a red heels and a blue sneakers 2
#1 1 The skies are pretty today 0
#2 2 One of aesthetic color is grey 1
奖励编辑:如果您想按单词查看字数的详细细分,请运行此
df['STORY'].astype(str).apply(lambda t: pd.Series({word: t.count(word) for word in word_list}))
# red green grey blue black
#0 1 0 0 1 0
#1 0 0 0 0 0
#2 0 0 1 0 0
推荐阅读
- python - = (equal) 在表达式大括号内的 f 字符串中做什么?
- c++ - 如何将 QLineEdit 与之间没有空格的 QPushButton 结合起来?
- javascript - POST 请求的 Ajax 调用结构
- angular - 需要根据组件中的if条件在mat errror中显示错误信息
- spring - 如何在spring boot中实现redis的多租户
- angularjs - 如何使用 angularjs 在用户字段中保存两个值?
- javascript - Firebase:无法将用户名 + 标题写入数据库
- javascript - 类型“void”不可分配给类型“(事件:MouseEvent
) => 无效' - asp.net - 配置了严格的传输安全性,但不适用于 asp .NET core 2.2
- c - 什么比长双更大?