首页 > 解决方案 > 如何使用正则表达式在文件中找到一些单词?

问题描述

我有很多文件,需要将它们分类为出现的单词。

例如)[..murder..murderAttempted..] 或 [murder, crimeAttempted] 等。

我试过这段代码。但并不是所有的都出来了。所以我想在由“[]”包围的文件中出现“murder”和“murderAttmpted”。

def func(root_dir):
for files in os.listdir(root_dir):
    pattern = r'\[.+murder.+murderAttempted.+'
    if "txt" in files:
        f = open(root_dir + files, 'rt', encoding='UTF8')
        for i, line in enumerate(f):
            for match in re.finditer(pattern, line):
                print(match.group())

标签: pythonregex

解决方案


This appears to work for me: pattern = r'\[.*murder.*murderAttempted.*\]' instead of pattern = r'\[.+murder.+murderAttempted.+'. I believe it returns all occurrences of "murder" and "murderAttempted" in files surrounded by "[]". The + requires 1 or more occurrence whereas * could have 0. Also note the addition of the end \]. This ensures you only capture strings that are enclosed in brackets.


推荐阅读