首页 > 解决方案 > How to match & replace multiple strings with regex in Python

问题描述

I am trying to replace some text in Python with regex.

My text looks like this:

WORKGROUP 1. John Doe ID123, Jane Smith ID456, Ohe Keedoke ID7890
Situation paragraph 1

WORKGROUP 2. John Smith ID321, Jane Doe ID654
Situation paragraph 2

What I am trying to do is put the names in double square brackets and remove the IDs so that it will end up looking like this.

WORKGROUP 1. [[John Doe]], [[Jane Smith]], [[Ohe Keedoke]]
Situation paragraph 1

WORKGROUP 2. [[John Smith]], [[Jane Doe]]
Situation paragraph 2

So far I have this.

re.sub(r"(WORKGROUP\s\d\.\s)",r"\1[[")
re.sub(r"(WORKGROUP\s\d\..+?)(?:\s\b\w+\b),(?:\s)(.+\n)",r"\1]], [[\2")
re.sub(r"(WORKGROUP\s\d\..+?)(?:\s\b\w+\b)(\n)",r"\1]]\2")

This works for groups with two people (WORKGROUP 2) but leaves all the IDs except the first and last persons' if there are more than two. So WORKGROUP 1 ends up looking like this.

WORKGROUP 1. [[John Doe]], [[Jane Smith ID456, Ohe Keedoke]]
Situation paragraph 1

Unfortunately, I can't do something like

re.sub(r"((\s\b\w+\b),(\s))+",r"\1]], [[\2")

because it will match inside the situation paragraphs.

My question is: is it possible to do multiple match/replacements in a string segment without doing it universally?

标签: pythonregexpython-3.x

解决方案


如果您安装了regex模块:

(?<=\bWORKGROUP\s+\d+\.\s|,)\s*(.+?)\s*ID\d+\s*(?=,|$)

可能工作正常。

如果没有,您可以在终端中简单地执行此操作,方法是运行:

$ pip install regex

或者

$ pip3 install regex

在这里,我们假设ID\d+您的文本中可能还有其他内容,否则,如果您不这样做,您的问题将非常简单。

测试

import regex as re

regex = r"(?<=\bWORKGROUP\s+\d+\.\s|,)\s*(.+?)\s*ID\d+\s*(?=,|$)"

test_str = '''

WORKGROUP 1. John Doe ID123, Jane Smith ID456, Ohe Keedoke ID7890
Situation paragraph 1
WORKGROUP 2. John Smith ID321, Jane Doe ID654
Situation paragraph 2

WORKGROUP 11. Bob Doe ID123, Alice Doe ID123, John Doe ID123, Jane Smith ID456, Ohe Keedoke ID7890
Situation paragraph 1

WORKGROUP 21. John Smith ID321, Jane Doe ID654
Situation paragraph 2

'''


subst = "[[\\1]]"

print(re.sub(regex, subst, test_str, 0, re.MULTILINE))

输出

WORKGROUP 1. [[John Doe]],[[Jane Smith]],[[Ohe Keedoke]]
Situation paragraph 1
WORKGROUP 2. [[John Smith]],[[Jane Doe]]
Situation paragraph 2

WORKGROUP 11. [[Bob Doe]],[[Alice Doe]],[[John Doe]],[[Jane Smith]],[[Ohe Keedoke]]
Situation paragraph 1

WORKGROUP 21. [[John Smith]],[[Jane Doe]]
Situation paragraph 2

如果您想简化/修改/探索表达式,它已在regex101.com的右上角面板中进行了说明。如果您愿意,您还可以在此链接中观看它如何与一些示例输入匹配。



推荐阅读