首页 > 解决方案 > 如何在正则表达式中为一组条件定义量词?

问题描述

我有这个字符串:

"Za @Foo_Bar: @BAR_foo @FooBAR @BArfoo"

和这样的正则表达式模式:

((Za\s)?@[A-Za-z0-9_]*)|(@[A-Za-z0-9_]*)

或者

(Za\s)?@[A-Za-z0-9_]*

我希望它返回此列表:

['Za @Foo_Bar','BAR_foo','FooBAR','BArfoo'] 

但我得到了意想不到的结果:

>>> import re
>>> import regex
>>> a = "Za @Foo_Bar: @BAR_foo @FooBAR @BArfoo"
>>> regex.fullmatch(u'((Za\s)?@[A-Za-z0-9_]*)|(@[A-Za-z0-9_]*)',a) is None
True
>>> re.findall(u'((Za\s)?@[A-Za-z0-9_]*)|(@[A-Za-z0-9_]*)',a)
[('Za @Foo_Bar', 'Za ', ''), ('@BAR_foo', '', ''), ('@FooBAR', '', ''), ('@BArfoo', '', '')]

第二个结果更有说服力,但它包含很多垃圾值:

>>> regex.findall(u'((Za\s)?@[A-Za-z0-9_]*)|(@[A-Za-z0-9_]*)',a)
[('Za @Foo_Bar', 'Za ', ''), ('@BAR_foo', '', ''), ('@FooBAR', '', ''), ('@BArfoo', '', '')]
>>> match  = re.search(u'((Za\s)?@[A-Za-z0-9_]*)|(@[A-Za-z0-9_]*)',a)
>>> match.groups()
('Za @Foo_Bar', 'Za ', None)

为什么fullmatch返回None?我怎样才能得到一个干净的清单?

标签: pythonregex

解决方案


不要使用组:

import re

s = "Za @Foo_Bar: @BAR_foo @FooBAR @BArfoo"
g = re.findall(r'(?:Za\s)@\w+|(?<=@)\w+', s)
print(g)

输出:

['Za @Foo_Bar', 'BAR_foo', 'FooBAR', 'BArfoo']

解释:

  (?:Za\s)  # non capture group
  @         # @
  \w+       # 1 or more word character
|           #
  (?<=@)    # lookbehind, make sure we have @ before
  \w+       # 1 or more word character

推荐阅读