首页 > 解决方案 > 在匹配python正则表达式后查找下一个/上一个字符串

问题描述

我需要找到文本中提到的人的姓名,我需要使用关键字列表过滤所有姓名,例如:

key_words = ["magistrate","officer","attorney","applicant","defendant","plaintfill"...]

For example, in the text:

INPUT: "The magistrate DANIEL SMITH blalblablal, who was in a meeting with the officer MARCO ANTONIO 
and WILL SMITH, defendant of the judgment filed by the plaintiff MARIA FREEMAN "

OUTPUT:
(magistrate, DANIEL SMITH)
(officer, MARCO ANTONIO)
(defendant, WILL SMITH)
(plaintfill, MARIA FREEMAN)

所以我有两个问题:首先是在键之前提到名称,其次是如何构建一个正则表达式以同时使用所有关键字和过滤器。

我尝试过一些事情:

line = re.split("magistrate",text)[1]
name = []
for key in line.split():
    if key.isupper(): name.append(key)
    else:
        break
" ".join(name)
OUTPUT: 'DANIEL SMITH'

谢谢!

标签: pythonregex

解决方案


我建议使用re.findall两个捕获组,方法如下:

import re
key_words = ["magistrate","officer","attorney","applicant","defendant","plaintiff"]
line = "The magistrate DANIEL SMITH blalblablal, who was in a meeting with the officer MARCO ANTONIO and WILL SMITH, defendant of the judgment filed by the plaintiff MARIA FREEMAN "
found = re.findall('('+'|'.join(key_words)+')'+r'\s+([ A-Z]+[A-Z])',line)
print(found)

输出:

[('magistrate', 'DANIEL SMITH'), ('officer', 'MARCO ANTONIO'), ('plaintiff', 'MARIA FREEMAN')]

说明:在 for 模式中使用多个捕获组re.findall(用(and表示))导致tuples 列表(在这种情况下为 2 元组)。第一组是通过使用|模式中的 OR 等工作简单地创建的,然后我们有一个或多个空格 ( \s+),它在任何组之外,因此不会出现在结果中,最后我们有第二组,它由一个或多个空格或ASCII 大写字母 ( [ A-Z]+) 后跟单个 ASCII 大写字母 ( [A-Z]),因此它不会捕获尾随空格。


推荐阅读