首页 > 解决方案 > 正则表达式查找包含单词的句子中的所有内容

问题描述

我试图弄清楚如何找到包含某个单词的句子,所以让我们说这个词是“哇”,然后在以下三个字符串中

\nOkay hold on. This is pretty wow in here. Okay.\n

\nThis is super wow. Doesn't get much more wow than that.\n

\nHold up. wow.\n

\nOkay wow. Just wow!\n

将分别产生以下结果:

This is pretty wow in here

This is super wow.

wow.

Okay wow.

我在 Python3 中执行此操作,因此我可以编写 if 语句,但它很混乱,我希望避免这样做。这是我工作但开始失败的代码。也许我在正则表达式方面太糟糕了,而且让这件事复杂化了。

    m = re.search('(?:(\.\s[A-Z]))(?=(.*)' + name+ '([^a-z^A-Z]))([^.]*)(\.\s[A-Z])', node.getIntroText())
    if m == None:
        m = re.search('(?:(\.\s[A-Z]))(?=(.*)' + name+ '([^a-z^A-Z]))(.*)(\.\s[A-Z])', node.getIntroText())
    if m == None:
        m = re.search('(?:([\r\n]))(?=(.*)' + name+ '([^a-z^A-Z]))([^.]*)(\.\s[A-Z])', node.getIntroText())

本质上,我想在“名称”之前捕获(第一个句点或换行符)实例,一直到句点的下一个实例,然后是(空格和除字母之外的任何内容)或换行符。

标签: regexpython-3.x

解决方案


将我的评论转换为答案。你可以使用这个正则表达式

>>> reg = re.compile(r"^(?:(?:(?!\bwow\b)[^.\n])*\. +)*((?:[a-z][^.\n]*?)?\bwow\b[^.\n]*)(?=\.)", re.MULTILINE | re.IGNORECASE)
>>> test_str = ("\n"
...     "Okay hold on. This is pretty wow in here. Okay.\n\n"
...     "This is super wow. Doesn't get much more wow than that.\n\n"
...     "Hold up. wow.\n\n"
...     "Okay wow. Just Wow!\n")
>>> print ( reg.findall(test_str) )

['This is pretty wow in here', 'This is super wow', 'wow', 'Okay wow']

正则表达式演示

正则表达式解释:

  • ^: 开始
  • (?:(?:(?!\bwow\b)[^.\n])*\. +)*: 匹配 0 个或多个不包含wow.
  • ((?:[a-z][^.\n]*?)?\bwow\b[^.\n]*): 匹配包含单词的句子wow
  • (?=\.): 断言我们在下一个位置有点
  • 模式re.MULTILINE | re.IGNORECASE适用于多行和忽略大小写

推荐阅读