python - python在同一行搜索不同的字符串
问题描述
我有以下要优化的代码:
if re.search(str(stringA), line) and re.search(str(stringB), line):
.....
.....
我试过了:
stringAB = stringA + '.*' + stringB
if re.search(str(stringAB), line):
.....
.....
但是我得到的结果并不可靠。我在这里使用“re.search”,因为它似乎是我可以搜索 stringA 和 stringB 中指定的模式的确切正则表达式的唯一方法。
这段代码背后的逻辑是根据 egrep 命令示例建模的:
stringA=Success
stringB=mysqlDB01
egrep "${stringA}" /var/app/mydata | egrep "${stringB}"
如果有更好的方法可以在没有 re.search 的情况下做到这一点,请告诉我。
解决方案
一种方法是创建一个匹配任一单词的模式(使用\b
所以我们只匹配完整的单词),用于re.findall
检查字符串中的所有匹配项,然后使用设置相等来确保两个单词都已匹配。
import re
stringA = "spam"
stringB = "egg"
words = {stringA, stringB}
# Make a pattern that matches either word
pat = re.compile(r"\b{}\b|\b{}\b".format(stringA, stringB))
data = [
"this string has spam in it",
"this string has egg in it",
"this string has egg in it and another egg too",
"this string has both egg and spam in it",
"the word spams shouldn't match",
"and eggs shouldn't match, either",
]
for s in data:
found = pat.findall(s)
print(repr(s), found, set(found) == words)
输出
'this string has spam in it' ['spam'] False
'this string has egg in it' ['egg'] False
'this string has egg in it and another egg too' ['egg', 'egg'] False
'this string has both egg and spam in it' ['egg', 'spam'] True
"the word spams shouldn't match" [] False
"and eggs shouldn't match, either" [] False
一种更有效的方法set(found) == words
是使用words.issubset(found)
,因为它跳过了found
.
正如 Jon Clements 在评论中提到的,我们可以简化和概括该模式来处理任意数量的单词,我们应该使用re.escape
,以防任何单词包含正则表达式元字符。
pat = re.compile(r"\b({})\b".format("|".join(re.escape(word) for word in words)))
谢谢,乔恩!
这是一个按指定顺序匹配单词的版本。如果找到匹配项,则打印匹配的子字符串,否则打印无。
import re
stringA = "spam"
stringB = "egg"
words = [stringA, stringB]
# Make a pattern that matches all the words, in order
pat = r"\b.*?\b".join([re.escape(word) for word in words])
pat = re.compile(r"\b" + pat + r"\b")
data = [
"this string has spam and also egg, in the proper order",
"this string has spam in it",
"this string has spamegg in it",
"this string has egg in it",
"this string has egg in it and another egg too",
"this string has both egg and spam in it",
"the word spams shouldn't match",
"and eggs shouldn't match, either",
]
for s in data:
found = pat.search(s)
if found:
found = found.group()
print('{!r}: {!r}'.format(s, found))
输出
'this string has spam and also egg, in the proper order': 'spam and also egg'
'this string has spam in it': None
'this string has spamegg in it': None
'this string has egg in it': None
'this string has egg in it and another egg too': None
'this string has both egg and spam in it': None
"the word spams shouldn't match": None
"and eggs shouldn't match, either": None
推荐阅读
- javascript - Vue.js @change 和 $event.target.value 问题
- java - 如何使用 Java 和两个嵌套的 for 循环计算扑克中的对数
- javascript - 使用谷歌应用脚本/javascript 自动生成谷歌电子表格报告而不使用按钮?
- javascript - 如何使用两个下拉菜单在文本框中打印值
- reactjs - 使用 Redux 的 connect() 后,历史作为孩子的道具在我的反应应用程序中消失了吗?
- sql - 在 postgresql 表中查找利润更高的用户
- maven - Maven - 获取所有测试的列表而不运行它们?
- javascript - jquery keyup/keydown 触发器在没有控制台错误的数据表中不起作用
- c++ - Gstreamer 动态更改源元素
- amazon-web-services - 将 filebeat 添加到 ebextensions 以及使用 AWS Code Pipelines 创建的 war 文件