python - 为什么这个正则表达式不匹配所有内容，直到第一个捕获组再次出现？

问题描述

我如何让它做到这一点？

现在它停在换行符处（就像在“芝加哥”之后一样）。或者，如果我使用 DOTALL，它只会匹配“Abbott A (1988)”，然后匹配字符串的其余部分直到最后。我希望它在下一次出现 (([\w\s]+)(([1|2]\d{3}))) 时停止，即..."Albu OB and Flyverbom M (2016 )”。等等等等。

欢迎任何指点。

pattern = r"(([\w\s]+)\(([1|2]\d{3})\))(.*)"

示例字符串

"Abbott A (1988) The System of Professions: An Essay on the Division of Expert Labor. Chicago,
IL: University of Chicago Press.
Albu OB and Flyverbom M (2016) Organizational transparency: conceptualizations, con-
ditions, and consequences. Business & Society. Epub ahead of print 13 July. DOI:
10.1177/0007650316659851.
Ananny M (2016) Toward an ethics of algorithms: convening, observation, probability, and timeli-
ness. Science, Technology & Human Values 41(1): 93–117. DOI: 10.1177/0162243915606523."

沙箱在这里

标签： pythonregexmultilinecitationsmultilinestring

您可以使用

(?sm)^([^()\n\r]+)\(([12]\d{3})\)(.*?)(?=^[^()\n\r]+\([12]\d{3}\)|\Z)

查看正则表达式演示

细节

(?sm)-re.DOTALL并re.MULTILINE启用
^- 一行的开始
([^()\n\r]+)(- 第 1 组：除, ), CR 和 LF之外的一个或多个字符
\(- 一个(
([12]\d{3})- 第 2 组：1或2然后任意 3 位数字
\)- 一个)字符
(.*?)- 第3组：任何0+个字符，包括换行符，尽可能少，直到（但不包括匹配）第一个......
(?=^[^()\r\n]+\([12]\d{3}\)|\Z)- （一个正向前瞻，要求其模式立即出现在当前位置的右侧）：
- ^[^()\r\n]+\([12]\d{3}\)- 与模式的开始相同，但没有组
- |- 或者
- \Z- 全文结束。

python - 为什么这个正则表达式不匹配所有内容，直到第一个捕获组再次出现？

问题描述

解决方案

推荐阅读