首页 > 解决方案 > 将姓名打印为名字和姓氏格式

问题描述

我有一个文本文件,其中包含以下数据:

Last name, First name in some of the cases

例如:

The patient was referred by Dr. Douglas, John, updated by: ‎Acosta, Christina
The patient was referred by Potter, Rob,M.D.
Sam was referred by Dr. Alisa Russo

我想要输出为:

John Douglas
Rob Potter
Alisa Russo

我正在使用代码:

print(str(string.partition(',')[2].split()[0] +" "+string.partition(',')[0].split()[0]))

标签: regexpython-3.xspacydata-extraction

解决方案


第一个挑战是获取医生的名字和姓氏。这很难,因为有些名字是毛茸茸的。带有一些交替的正则表达式可以提供帮助,例如

(?:Dr. )(\w+) (\w+)|(?:Dr. )(\w+), (\w+)|(\w+), (\w+),?(?: ?M\.?D\.?)

演示

代码示例

import re

regex = r"(?:Dr. )(\w+) (\w+)|(?:Dr. )(\w+), (\w+)|(\w+), (\w+),?(?: ?M\.?D\.?)"

test_str = ("The patient was referred by Dr. Douglas, John, updated by: ‎Acosta, Christina\n"
    "The patient was referred by Potter, Rob,M.D.\n"
    "Sam was referred by Dr. Alisa Russo")

matches = re.finditer(regex, test_str, re.MULTILINE)
results = []

for match in matches:
    if match.group(1):
        results.append([match.group(1), match.group(2)])
        next
    if match.group(3):
        results.append([match.group(4), match.group(3)])            
        next
    if match.group(5):
        results.append([match.group(6), match.group(5)])
        next

输出是列表列表。然后,打印变得非常容易。

[['John', 'Douglas'], ['Rob', 'Potter'], ['Alisa', 'Russo']]

推荐阅读