python - Python - 遍历关键字列表,搜索字符串中的匹配数,计算最终总数
问题描述
我有一些词要检查,看看它们是否出现在研究摘要中,如果出现,请计算出现次数。不知道我的代码做错了什么,但它的计数不正确。提前致谢!
mh_terms = ['mental', 'ptsd', 'sud', 'substance abuse', 'drug abuse',
'alcohol', 'alcoholism', 'anxiety', 'depressing', 'bipolar', 'mh',
'smi', 'oud', 'opioid' ]
singleabstract = 'This is a research abstract that includes words like
mental health and anxiety. My hope is that I get my code to work and
not resort to alcohol.'
for mh in mh_terms:
mh = mh.lower
mh = str(mh)
number_of_occurences = 0
for word in singleabstract.split():
if mh in word:
number_of_occurences += 1
print(number_of_occurences)
解决方案
通常,对于分组,adict
是一个很好的方法。对于计数,您可以使用如下实现:
c = {}
singleabstract = 'This is a research abstract that includes words like
mental health and anxiety. My hope is that I get my code to work and
not resort to alcohol.'
for s in singleabstract.split():
s = ''.join(char for char in s.lower() if char.isalpha()) # '<punctuation>'.isalpha() yields False
# you'll need to check if the word is in the dict
# first, and set it to 1
if s not in c:
c[s] = 1
# otherwise, increment the existing value by 1
else:
c[s] += 1
# You can sum the number of occurrences, but you'll need
# to use c.get to avoid KeyErrors
occurrences = sum(c.get(term, 0) for term in mh_terms)
occurrences
3
# or you can use an if in the generator expression
occurrences = sum(c[term] for term in mh_terms if term in c)
计算出现次数的最佳方法是使用collections.Counter
. 这是一个字典,它允许你 O(1) 检查键:
from collections import Counter
singleabstract = 'This is a research abstract that includes words like
mental health and anxiety. My hope is that I get my code to work and
not resort to alcohol.'
# the Counter can consume a generator expression analogous to
# the for loop in the dict implementation
c = Counter(''.join(char for char in s.lower() if char.isalpha())
for s in singleabstract.split())
# Then you can iterate through
for term in mh_terms:
# don't need to use get, as Counter will return 0
# for missing keys, rather than raising KeyError
print(term, c[term])
mental 1
ptsd 0
sud 0
substance abuse 0
drug abuse 0
alcohol 1
alcoholism 0
anxiety 1
depressing 0
bipolar 0
mh 0
smi 0
oud 0
opioid 0
要获得所需的输出,您可以总结Counter
对象的值:
total_occurrences = sum(c[v] for v in mh_terms)
total_occurrences
3
推荐阅读
- javascript - 在侧边栏外显示具有滚动条的下拉菜单,但仍与锚点位于同一位置
- python - 从进程id获取正在运行的python脚本的源文件路径
- node.js - 将nodejs部署到http服务器的80端口
- r - 随后更改 ID 的所有出现/再次出现的指示变量
- sapui5 - 有没有办法使片段中 SAPUI5 绑定的模型名称动态化?
- javascript - 如何根据第一个下拉值使第二个下拉列表自动选择值
- sql - 如何在sql中实现以下要求
- javascript - 在高度受限的环境中运行 python 代码而不安装 python
- java - 是什么让这些方法调用返回这些值?
- html - 如何将下拉选择的值传递给laravel中另一个页面的下拉列表