python - How to match & replace multiple strings with regex in Python
问题描述
I am trying to replace some text in Python with regex.
My text looks like this:
WORKGROUP 1. John Doe ID123, Jane Smith ID456, Ohe Keedoke ID7890
Situation paragraph 1
WORKGROUP 2. John Smith ID321, Jane Doe ID654
Situation paragraph 2
What I am trying to do is put the names in double square brackets and remove the IDs so that it will end up looking like this.
WORKGROUP 1. [[John Doe]], [[Jane Smith]], [[Ohe Keedoke]]
Situation paragraph 1
WORKGROUP 2. [[John Smith]], [[Jane Doe]]
Situation paragraph 2
So far I have this.
re.sub(r"(WORKGROUP\s\d\.\s)",r"\1[[")
re.sub(r"(WORKGROUP\s\d\..+?)(?:\s\b\w+\b),(?:\s)(.+\n)",r"\1]], [[\2")
re.sub(r"(WORKGROUP\s\d\..+?)(?:\s\b\w+\b)(\n)",r"\1]]\2")
This works for groups with two people (WORKGROUP 2) but leaves all the IDs except the first and last persons' if there are more than two. So WORKGROUP 1 ends up looking like this.
WORKGROUP 1. [[John Doe]], [[Jane Smith ID456, Ohe Keedoke]]
Situation paragraph 1
Unfortunately, I can't do something like
re.sub(r"((\s\b\w+\b),(\s))+",r"\1]], [[\2")
because it will match inside the situation paragraphs.
My question is: is it possible to do multiple match/replacements in a string segment without doing it universally?
解决方案
如果您安装了regex
模块:
(?<=\bWORKGROUP\s+\d+\.\s|,)\s*(.+?)\s*ID\d+\s*(?=,|$)
可能工作正常。
如果没有,您可以在终端中简单地执行此操作,方法是运行:
$ pip install regex
或者
$ pip3 install regex
在这里,我们假设ID\d+
您的文本中可能还有其他内容,否则,如果您不这样做,您的问题将非常简单。
测试
import regex as re
regex = r"(?<=\bWORKGROUP\s+\d+\.\s|,)\s*(.+?)\s*ID\d+\s*(?=,|$)"
test_str = '''
WORKGROUP 1. John Doe ID123, Jane Smith ID456, Ohe Keedoke ID7890
Situation paragraph 1
WORKGROUP 2. John Smith ID321, Jane Doe ID654
Situation paragraph 2
WORKGROUP 11. Bob Doe ID123, Alice Doe ID123, John Doe ID123, Jane Smith ID456, Ohe Keedoke ID7890
Situation paragraph 1
WORKGROUP 21. John Smith ID321, Jane Doe ID654
Situation paragraph 2
'''
subst = "[[\\1]]"
print(re.sub(regex, subst, test_str, 0, re.MULTILINE))
输出
WORKGROUP 1. [[John Doe]],[[Jane Smith]],[[Ohe Keedoke]]
Situation paragraph 1
WORKGROUP 2. [[John Smith]],[[Jane Doe]]
Situation paragraph 2
WORKGROUP 11. [[Bob Doe]],[[Alice Doe]],[[John Doe]],[[Jane Smith]],[[Ohe Keedoke]]
Situation paragraph 1
WORKGROUP 21. [[John Smith]],[[Jane Doe]]
Situation paragraph 2
如果您想简化/修改/探索表达式,它已在regex101.com的右上角面板中进行了说明。如果您愿意,您还可以在此链接中观看它如何与一些示例输入匹配。
推荐阅读
- python - 根据另一列的值填充一列中的缺失值
- python-3.x - 是否将类集成到我现有的代码中?Tkinter,蟒蛇 3
- c# - 在 Windows 窗体中实现多边形,c#,visual studio 2019
- python - 我如何不使用 iterrows 来解决我的问题?
- android - 如何在 Android 中处理来自 Fragment 的自定义方案?
- bash - 如何在文件 2 中的特定行之后复制文件 1 中的特定行?
- firebase - 为什么我的 Firebase 登录功能没有在页面刷新时保存用户会话登录?
- javascript - 为什么不等待等待的 Promise.all 解决?
- python - 查询在 postgres shell 中工作,但有时无法在 psycopg2 中返回结果
- c - 目标文件与其他文件的依赖关系不适用于我的 C makefile