python - 文本替换在特殊情况下不起作用
问题描述
我有一个名为 Words.txt 的单词列表文件,其中包含数百个单词和一些字幕文件 (.srt)。我想浏览所有字幕文件,并在它们中搜索单词列表文件中的所有单词。如果找到一个单词,我想将它的颜色更改为绿色。这是代码:
import fileinput
import os
import re
wordsPath = 'C:/Users/John/Desktop/Subs/Words.txt'
subsPath = 'C:/Users/John/Desktop/Subs/Season1'
wordList = []
wordFile = open(wordsPath, 'r')
for line in wordFile:
line = line.strip()
wordList.append(line)
for word in wordList:
for root, dirs, files in os.walk(subsPath, topdown=False):
for fileName in files:
if fileName.endswith(".srt"):
with open(fileName, 'r') as file :
filedata = file.read()
filedata = filedata.replace(' ' +word+ ' ', ' ' + '<font color="Green">' +word+'</font>' + ' ')
with open(fileName, 'w') as file:
file.write(filedata)
假设“书”一词在列表中,并且可以在其中一个字幕文件中找到。只要这个词出现在“这本书太棒了”这样的句子中,我的代码就可以正常工作。但是,当单词像“BOOK”、“Book”这样被提及时,并且当它出现在开头或句子的结尾时,代码就会失败。我怎么解决这个问题?
解决方案
您正在使用str.replace,来自文档:
Return a copy of the string with all occurrences of substring old replaced by new
这里的出现意味着字符串 old 的完全匹配,然后该函数将尝试替换由空格包围的单词,例如' book '
that is different than' BOOK '
和。让我们看看一些也不匹配的案例:' Book '
' book'
" book " == " BOOK " # False
" book " == " book" # False
" book " == " Book " # False
" book " == " bOok " # False
" book " == " book " # False
一种替代方法是使用这样的正则表达式:
import re
words = ["book", "rule"]
sentences = ["This book is amazing", "The not so good book", "OMG what a great BOOK", "One Book to rule them all",
"Just book."]
patterns = [re.compile(r"\b({})\b".format(word), re.IGNORECASE | re.UNICODE) for word in words]
replacements = ['<font color="Green">' + word + '</font>' for word in words]
for sentence in sentences:
result = sentence[:]
for pattern, replacement in zip(patterns, replacements):
result = pattern.sub(r'<font color="Green">\1</font>', result)
print(result)
输出
This <font color="Green">book</font> is amazing
The not so good <font color="Green">book</font>
OMG what a great <font color="Green">BOOK</font>
One <font color="Green">Book</font> to <font color="Green">rule</font> them all
Just <font color="Green">book</font>.
推荐阅读
- api - APIGEE 修订比较
- angular - 如何在 Angular 组件 + webpack 中预处理并正确包含 scss
- jenkins - CI/CD 管道中的工作流程 - Jenkins、Kubernetes 和 SVN/GitHub 代码库
- android - 动画 GIF 无法播放
- ruby-on-rails - 没有使用 react-rails prerender 渲染 Material-UI 组件
- delphi - Delphi 字符串 + #32 控制字符
- ios - 如何从 JSON 中获取信号
- javascript - React Native“获取”返回网络错误
- javascript - 根据 div 颜色背景更改 div 的颜色
- python - 从python中的给定url中抓取两列