首页 > 解决方案 > python在同一行搜索不同的字符串

问题描述

我有以下要优化的代码:

if re.search(str(stringA), line) and re.search(str(stringB), line):
    .....
    .....

我试过了:

stringAB = stringA + '.*' + stringB
if re.search(str(stringAB), line):
    .....
    .....

但是我得到的结果并不可靠。我在这里使用“re.search”,因为它似乎是我可以搜索 stringA 和 stringB 中指定的模式的确切正则表达式的唯一方法。

这段代码背后的逻辑是根据 egrep 命令示例建模的:

stringA=Success
stringB=mysqlDB01

egrep "${stringA}" /var/app/mydata | egrep "${stringB}"

如果有更好的方法可以在没有 re.search 的情况下做到这一点,请告诉我。

标签: python

解决方案


一种方法是创建一个匹配任一单词的模式(使用\b所以我们只匹配完整的单词),用于re.findall检查字符串中的所有匹配项,然后使用设置相等来确保两个单词都已匹配。

import re

stringA = "spam"
stringB = "egg"

words = {stringA, stringB}

# Make a pattern that matches either word
pat = re.compile(r"\b{}\b|\b{}\b".format(stringA, stringB))

data = [
    "this string has spam in it",
    "this string has egg in it",
    "this string has egg in it and another egg too",
    "this string has both egg and spam in it",
    "the word spams shouldn't match",
    "and eggs shouldn't match, either",
]

for s in data:
    found = pat.findall(s)
    print(repr(s), found, set(found) == words)   

输出

'this string has spam in it' ['spam'] False
'this string has egg in it' ['egg'] False
'this string has egg in it and another egg too' ['egg', 'egg'] False
'this string has both egg and spam in it' ['egg', 'spam'] True
"the word spams shouldn't match" [] False
"and eggs shouldn't match, either" [] False

一种更有效的方法set(found) == words是使用words.issubset(found),因为它跳过了found.


正如 Jon Clements 在评论中提到的,我们可以简化和概括该模式来处理任意数量的单词,我们应该使用re.escape,以防任何单词包含正则表达式元字符。

pat = re.compile(r"\b({})\b".format("|".join(re.escape(word) for word in words)))

谢谢,乔恩!


这是一个按指定顺序匹配单词的版本。如果找到匹配项,则打印匹配的子字符串,否则打印无。

import re

stringA = "spam"
stringB = "egg"
words = [stringA, stringB]

# Make a pattern that matches all the words, in order
pat = r"\b.*?\b".join([re.escape(word) for word in words])
pat = re.compile(r"\b" + pat + r"\b")

data = [
    "this string has spam and also egg, in the proper order",
    "this string has spam in it",
    "this string has spamegg in it",
    "this string has egg in it",
    "this string has egg in it and another egg too",
    "this string has both egg and spam in it",
    "the word spams shouldn't match",
    "and eggs shouldn't match, either",
]

for s in data:
    found = pat.search(s)
    if found:
        found = found.group()
    print('{!r}: {!r}'.format(s, found))

输出

'this string has spam and also egg, in the proper order': 'spam and also egg'
'this string has spam in it': None
'this string has spamegg in it': None
'this string has egg in it': None
'this string has egg in it and another egg too': None
'this string has both egg and spam in it': None
"the word spams shouldn't match": None
"and eggs shouldn't match, either": None

推荐阅读