python - 正则表达式:匹配直到模式的倒数第二个出现
问题描述
我有这个text_string
字符串列表。我想将所有内容提取text_i_want
为一个字符串。
text_string = ['text_i_dont_want aaaaaa aaaaaaa', 'text_i_want', 'text_i_want\ntext_i_want.', 'text_i_want.', 'text_i_dont_want\text_i_dont_want', 'number_i_dont_want']
我想匹配下面已经编译的所有内容[“ aaaaaa aaaaaaa ', '
”之后的所有内容],直到倒数第二个出现以下突出显示的模式:','
ans = re.compile(r'aaaaaa\s+aaaaaaa\',\s+\'(.*)', flags = re.DOTALL | re.MULTILINE)
ans_text = ans.search(str(text_string)).group(1)
返回
'text_i_want', 'text_i_want\ntext_i_want.', 'text_i_want.', 'text_i_dont_want\text_i_dont_want', 'number_i_dont_want'
到目前为止,我的正则表达式已成功匹配字符串的开头,但并未停止我想要的位置(倒数第二个模式)。我不知道怎么翻译
until the second to last occurence of ', '
成正则表达式语言。任何帮助表示赞赏
另外,我想这样做是re
因为我有数百个列表,它们都是相等的,除了发生的次数'text_i_want.'
。
示例列表,其中解决方案代码也应该工作:
text_list2 = ['text_i_dont_want aaaaaa aaaaaaa', 'text_i_want', 'text_i_want.', 'text_i_dont_want\text_i_dont_want', 'number_i_dont_want']
更新的问题:
有些列表的结尾与我说的不同。好消息是它们具有特定的特征,这使它们与众不同。
text_list3 = ['text_i_dont_want aaaaaa aaaaaaa', 'text_i_want', 'text_i_want.', 'ttttttt tttttt', 'text_i_dont_want', 'text_i_dont_want', 'text_i_dont_want']
rx = re.compile(r'aaa\s+aaa$')
ans = [i for (i, x) in enumerate(text_string) if rx.search(x)]
pattern = 'ttttttt tttttt'
if ans:
ans_text = text_list3[ans[0]+1:-2]
if pattern in text_list3:
ans_text = text_list3[ans[0]+1:-3]
解决方案
您可以在以字符串结尾的text_string
(不是字符串而是列表)中找到一个项目aaaaaa aaaaaaa
,然后通过切片获取下一个项目中的所有项目,直到您需要的项目:
text_string = ['text_i_dont_want aaaaaa aaaaaaa', 'text_i_want', 'text_i_want\ntext_i_want.', 'text_i_want.', 'text_i_dont_want\text_i_dont_want', 'number_i_dont_want']
start = [i for (i, x) in enumerate( text_string ) if x.endswith('aaaaaa aaaaaaa')][0]
print( text_string[start+1:-2] )
# => ['text_i_want', 'text_i_want\ntext_i_want.', 'text_i_want.']
如果您更喜欢用 s 检查aaa
s re
,您可以使用
import re
text_string = ['text_i_dont_want aaaaaa aaaaaaa', 'text_i_want', 'text_i_want\ntext_i_want.', 'text_i_want.', 'text_i_dont_want\text_i_dont_want', 'number_i_dont_want']
rx = re.compile(r'aaaaaa\s+aaaaaaa$')
start = [i for (i, x) in enumerate( text_string ) if rx.search(x)]
if start:
print( text_string[start[0]+1:-2] )
请参阅此 Python 演示。
推荐阅读
- python - After connecting to *.accdb file with pyodbc, I cannot type Korean language in QLineEdit
- java - UCAExc:::3.0.7 row column count mismatch
- android - 应用程序启动和关闭时的 Android 定期工作请求启动和停止
- java - 为什么它一直给我一个系统包错误?
- java - dynamic jasper or jasper reporting does not work with maven dependency: Failed to read artifact descriptor for com.itextpdf:itext-pdfa:jar:5.5.0
- .net - How to remove controller name from URL in MVC project
- php - 如何创建表格行作为选择选项
- c# - CS0029 C# Cannot implicitly convert type to 'string[]'
- android - React native build success but there is no release apk
- bloom-filter - 如何从散列函数生成散列值以及如何从这些散列值中获取整数值?