首页 > 解决方案 > 文本替换在特殊情况下不起作用

问题描述

我有一个名为 Words.txt 的单词列表文件,其中包含数百个单词和一些字幕文件 (.srt)。我想浏览所有字幕文件,并在它们中搜索单词列表文件中的所有单词。如果找到一个单词,我想将它的颜色更改为绿色。这是代码:

import fileinput
import os
import re

wordsPath = 'C:/Users/John/Desktop/Subs/Words.txt'
subsPath = 'C:/Users/John/Desktop/Subs/Season1'
wordList = []

wordFile = open(wordsPath, 'r')
for line in wordFile:
    line = line.strip()
    wordList.append(line)

for word in wordList:
    for root, dirs, files in os.walk(subsPath, topdown=False):
        for fileName in files:
            if fileName.endswith(".srt"):
                with open(fileName, 'r') as file :
                    filedata = file.read()
                    filedata = filedata.replace(' '  +word+  ' ', ' ' + '<font color="Green">' +word+'</font>' + ' ')
                with open(fileName, 'w') as file:
                    file.write(filedata)

假设“书”一词在列表中,并且可以在其中一个字幕文件中找到。只要这个词出现在“这本书太棒了”这样的句子中,我的代码就可以正常工作。但是,当单词像“BOOK”、“Book”这样被提及时,并且当它出现在开头或句子的结尾时,代码就会失败。我怎么解决这个问题?

标签: pythonpython-3.xsearchreplace

解决方案


您正在使用str.replace,来自文档:

Return a copy of the string with all occurrences of substring old replaced by new

这里的出现意味着字符串 old 的完全匹配,然后该函数将尝试替换由空格包围的单词,例如' book 'that is different than' BOOK '和。让我们看看一些也不匹配的案例:' Book '' book'

" book " == " BOOK "  # False
" book " == " book"  # False
" book " == " Book "  # False
" book " == " bOok " # False
" book " == "   book " # False

一种替代方法是使用这样的正则表达式:

import re

words = ["book", "rule"]
sentences = ["This book is amazing", "The not so good book", "OMG what a great BOOK", "One Book to rule them all",
             "Just book."]

patterns = [re.compile(r"\b({})\b".format(word), re.IGNORECASE | re.UNICODE) for word in words]
replacements = ['<font color="Green">' + word + '</font>' for word in words]

for sentence in sentences:

    result = sentence[:]
    for pattern, replacement in zip(patterns, replacements):
        result = pattern.sub(r'<font color="Green">\1</font>', result)
    print(result)

输出

This <font color="Green">book</font> is amazing
The not so good <font color="Green">book</font>
OMG what a great <font color="Green">BOOK</font>
One <font color="Green">Book</font> to <font color="Green">rule</font> them all
Just <font color="Green">book</font>.

推荐阅读