python - 检查字符串中的(仅整个)单词
问题描述
Checkio 培训。该任务称为流行词。任务是从给定字符串中的(字符串)列表中搜索单词。
例如:
textt="When I was One I had just begun When I was Two I was nearly new"
wwords=['i', 'was', 'three', 'near']
我的代码如下:
def popular_words(text: str, words: list) -> dict:
# your code here
occurence={}
text=text.lower()
for i in words:
occurence[i]=(text.count(i))
# incorrectly takes "nearly" as "near"
print(occurence)
return(occurence)
popular_words(textt,wwords)
几乎可以正常工作,返回
{'i': 4, 'was': 3, 'three': 0, 'near': 1}
因此将“near”算作“nearly”的一部分。这显然是作者的意图。然而,我找不到解决这个问题的方法,除了
"search for words that are not first (index 0) or last (last index) and for these that begin/end with whitespace"
请问我可以寻求帮助吗?请以这个相当幼稚的代码为基础。
解决方案
你最好拆分你的句子,然后计算单词,而不是子字符串:
textt="When I was One I had just begun When I was Two I was nearly new"
wwords=['i', 'was', 'three', 'near']
text_words = textt.lower().split()
result = {w:text_words.count(w) for w in wwords}
print(result)
印刷:
{'three': 0, 'i': 4, 'near': 0, 'was': 3}
如果文本现在有标点符号,最好使用正则表达式根据非字母数字分割字符串:
import re
textt="When I was One, I had just begun.I was Two when I was nearly new"
wwords=['i', 'was', 'three', 'near']
text_words = re.split("\W+",textt.lower())
result = {w:text_words.count(w) for w in wwords}
结果:
{'was': 3, 'near': 0, 'three': 0, 'i': 4}
(另一种选择是findall
在单词字符上使用text_words = re.findall(r"\w+",textt.lower())
:)
现在,如果您的“重要”单词列表很大,也许最好计算所有单词,然后使用经典的过滤器collections.Counter
:
text_words = collections.Counter(re.split("\W+",textt.lower()))
result = {w:text_words.get(w) for w in wwords}
推荐阅读
- visual-studio - 基本 Unity 应用程序仅在 Hololens 上显示黑色/空白,没有启动画面
- ios - 无法上传到 Azure 存储
- javascript - 将 id 传递给函数时,javascript 会丢失 id
- c++ - 对象如何成为子对象?
- python - 检测到边界框时如何向arduino发送命令?
- android - 如何在另一个类中实现改造回调?
- r - 如何保存“quosure”以在另一个会话中使用?
- ios - Swift - 使用委托传递变量
- c++ - 为什么删除分配的数组会导致内存错误?
- asp.net-core - 使用 Rotativa 生成 Razor Pages PDF - @Model null