首页 > 解决方案 > 在 python 中访问正则表达式捕获组

问题描述

ptx捕获了我想要的大部分内容。因为我无法将许多东西组合成一个正则表达式)我创建了第二个ptx1正则表达式,它应该额外捕获以下字符序列 :One Department,,,One foreign DepartmentTwo office

    text_list = [' something\npatternx: text_i_want One Department',' something patternx: text_i_want One foreign Department',' something\n patternx: text_i_want Two office']
    text_list = ' '.join(map(str, text_list))
    ptx = re.compile(r'(\s+something(?:\s+|\\n)*patternx:)(.*)(One\s+foreign)', flags = re.DOTALL)
    ten = ptx.search(text_list)
    try:
        if ten:
            ten = ten.group(2)
        else:
            ten = None
    except:
        pass

我的问题是:我需要做什么才能让(.*)ortext_i_want内容返回?我有一种直觉,我需要像访问eleven列表一样访问它,因为它有很多捕获组eleven[0].group(1),以便从列表中获取第一个元素并获取它的第二组。但这也没有用。

你可以这样text_list

text_list = ['...something\npatternx: text_i_want One Department',
'...something patternx: text_i_want One foreign Department',
'...something\n patternx: text_i_want Two office']

更新

    text_list = [' something\npatternx: text_i_want One Department',' something patternx: text_i_want One foreign Department',' something\n patternx: text_i_want Two office']
    text_list = ' '.join(map(str, text_list))
    ptx = re.compile(r'\bsomething\s+patternx:(.*?)\b(?:One\s+(?:Department|foreign(?:\s+Department)?)|Two\s+office)\b', flags = re.DOTALL)
    ten = ptx.search(text_list)
    try:
        if ten:
            ten = ten.group(2)
        else:
            ten = None
    except:
        pass

标签: pythonregex

解决方案


在考虑右侧的替代方案时,您似乎被欺骗了。

你需要使用

\bsomething\s+patternx:(.*?)\b(?:One\s+foreign|One\s+Department|One\s+foreign\s+Department|Two\s+office)\b

可以缩短为

\bsomething\s+patternx:(.*?)\b(?:One\s+(?:Department|foreign(?:\s+Department)?)|Two\s+office)\b

请参阅正则表达式演示详情

  • \bsomething\s+patternx:- 整个单词something,一个或多个空格,patternx:字符串
  • (.*?)- 第 1 组:尽可能少的任何零个或多个字符
  • \b(?:One\s+(?:Department|foreign(?:\s+Department)?)|Two\s+office)\b- One Department, One foreign, One foreign Department, 或Two office作为整个词。

请参阅Python 演示

import re
text_list = [' something\npatternx: text_i_want One Department',' something patternx: text_i_want One foreign Department',' something\n patternx: text_i_want Two office']
text_list = ' '.join(map(str, text_list))
rx = r'\bsomething\s+patternx:(.*?)\b(?:One\s+(?:Department|foreign(?:\s+Department)?)|Two\s+office)\b'
print(re.findall(rx, text_list, re.DOTALL))
# => [' text_i_want ', ' text_i_want ', ' text_i_want '] 

推荐阅读