python - 在 python 中访问正则表达式捕获组
问题描述
ptx
捕获了我想要的大部分内容。因为我无法将许多东西组合成一个正则表达式)我创建了第二个ptx1
正则表达式,它应该额外捕获以下字符序列
:One Department
,,,One foreign Department
Two office
text_list = [' something\npatternx: text_i_want One Department',' something patternx: text_i_want One foreign Department',' something\n patternx: text_i_want Two office']
text_list = ' '.join(map(str, text_list))
ptx = re.compile(r'(\s+something(?:\s+|\\n)*patternx:)(.*)(One\s+foreign)', flags = re.DOTALL)
ten = ptx.search(text_list)
try:
if ten:
ten = ten.group(2)
else:
ten = None
except:
pass
我的问题是:我需要做什么才能让(.*)
ortext_i_want
内容返回?我有一种直觉,我需要像访问eleven
列表一样访问它,因为它有很多捕获组eleven[0].group(1)
,以便从列表中获取第一个元素并获取它的第二组。但这也没有用。
你可以这样text_list
想
text_list = ['...something\npatternx: text_i_want One Department',
'...something patternx: text_i_want One foreign Department',
'...something\n patternx: text_i_want Two office']
更新
text_list = [' something\npatternx: text_i_want One Department',' something patternx: text_i_want One foreign Department',' something\n patternx: text_i_want Two office']
text_list = ' '.join(map(str, text_list))
ptx = re.compile(r'\bsomething\s+patternx:(.*?)\b(?:One\s+(?:Department|foreign(?:\s+Department)?)|Two\s+office)\b', flags = re.DOTALL)
ten = ptx.search(text_list)
try:
if ten:
ten = ten.group(2)
else:
ten = None
except:
pass
解决方案
在考虑右侧的替代方案时,您似乎被欺骗了。
你需要使用
\bsomething\s+patternx:(.*?)\b(?:One\s+foreign|One\s+Department|One\s+foreign\s+Department|Two\s+office)\b
可以缩短为
\bsomething\s+patternx:(.*?)\b(?:One\s+(?:Department|foreign(?:\s+Department)?)|Two\s+office)\b
请参阅正则表达式演示。详情:
\bsomething\s+patternx:
- 整个单词something
,一个或多个空格,patternx:
字符串(.*?)
- 第 1 组:尽可能少的任何零个或多个字符\b(?:One\s+(?:Department|foreign(?:\s+Department)?)|Two\s+office)\b
-One Department
,One foreign
,One foreign Department
, 或Two office
作为整个词。
请参阅Python 演示:
import re
text_list = [' something\npatternx: text_i_want One Department',' something patternx: text_i_want One foreign Department',' something\n patternx: text_i_want Two office']
text_list = ' '.join(map(str, text_list))
rx = r'\bsomething\s+patternx:(.*?)\b(?:One\s+(?:Department|foreign(?:\s+Department)?)|Two\s+office)\b'
print(re.findall(rx, text_list, re.DOTALL))
# => [' text_i_want ', ' text_i_want ', ' text_i_want ']
推荐阅读
- .net-core - .net core 中的 Stimulsoft Report Viewer 本地化问题
- forms - 颤振等效于键上的字符串验证
- jquery - 根据不升序或降序的列值对表行重新排序
- javascript - 量角器 - 通过 count() 函数遍历所有元素对我不起作用
- r - 如何使用 str_detect 在两个不同的数据帧之间进行匹配?
- sql - SQL 语句中的合并语句在单个 SP 中添加、更新、删除
- java - QueryDsl 可以处理更大量的数据吗?
- python - python中的电报机器人代理
- javascript - 与子进程同步运行 JavaScript 的最佳方法是什么?
- android - 无法解析方法“addOnCompleteListener”和“setonclicklistener”