首页 > 解决方案 > 在 python 中编写正则表达式来匹配两个特定的单词,允许设置单词之间的数量

问题描述

我在 python 中编写正则表达式函数来识别有两个单词的字符串时遇到了一些困难,按特定顺序,介于 2 到 4 个单词之间。例如,给定短语“解雇的工作”,我希望识别字符串“我被解雇”。我最初的想法是,最好的方法是在两者之间留出 2 到 4 个空格。我写了以下内容,这似乎不起作用,希望能提供意见。

re.search('(fired)(\s{2,4})(job)','I was fired from my job')

标签: pythonregex

解决方案


在匹配 2-4 个空白字符中(fired)(\s{2,4})(job)\s{2,4}并且不允许在firedjob子字符串之间使用可选单词。

利用

\bfired(?:\s+\S+){0,2}\s+job\b

请参阅正则表达式证明

解释

--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  fired                    'fired'
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (between 0 and 2
                           times (matching the most amount
                           possible)):
--------------------------------------------------------------------------------
    \s+                      whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    \S+                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (1 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
  ){0,2}                   end of grouping
--------------------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  job                      'job'
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char

蟒蛇代码

import re
s = 'I was fired from my job'
if re.search(r"\bfired(?:\s+\S+){0,2}\s+job\b", s):
    print("Matched!")
else:
    print("Not matched.")

结果Matched!


推荐阅读