首页 > 解决方案 > 从给定格式中提取所需的名称

问题描述

我有一个包含数据的文本文件,如下所示。我必须从中提取一些必需的名称。我正在尝试下面的代码,但没有得到所需的结果。

该文件包含以下数据:

Leader :     Tim Lee ; 34567
Head\Organiser: Sam Mathews; 11:53 am
Head: Alica Mills; 45612
Head\Secretary: Maya Hill; #53190
Captain- Jocey David # 45123
Vice Captain:- Jacob Green;  -65432

我正在尝试的代码:

import re
pattern = re.compile(r'(Leader|Head\\Organiser|Captain|Vice Captain).*(\w+)',re.I)
matches=pattern.findall(line)
for match in matches:
    print(match)

预期输出:

Tim Lee
Sam Mathews
Jocey David
Jacob Green

标签: pythonregexpython-3.xdata-extraction

解决方案


import re
line = '''
Leader :     Tim Lee ; 34567
Head\Organiser: Sam Mathews; 11:53 am
Head: Alica Mills; 45612
Head\Secretary: Maya Hill; #53190
Captain- Jocey David # 45123
Vice Captain:- Jacob Green;  -65432'''
pattern = re.compile(r'(?:Leader|Head(?:\\Organiser|\\Secretary)?|Captain|Vice Captain)\W+(\w+(?:\s+\w+)?)',re.I)
matches=pattern.findall(line)
for match in matches:
    print(match)

解释:

(?:                 : start non capture group
  Leader            : literally
 |                  : OR
  Head              : literally
  (?:               : start non capture group
    \\Organiser     : literally
   |                : OR
    \\Secretary     : literally
  )?                ! end group, optional
 |                  : OR
  Captain           : literally
 |                  : OR
  Vice Captain      : literally
)                   : end group
\W+                 : 1 or more non word character
(                   : start group 1
  \w+               : 1 or more word char
  (?:               : non capture group
    \s+             : 1 or more spaces
    \w+             : 1 or more word char
  )?                : end group, optional
)                   : end group 1

给定示例的结果:

Tim Lee
Sam Mathews
Alica Mills
Maya Hill
Jocey David
Jacob Green

推荐阅读