首页 > 解决方案 > Python - 在多个子字符串之间捕获多个子字符串

问题描述

我拥有的数据格式非常糟糕 .txt 。我试图在这些开始和结束字符串之间捕捉意义完整的单词/句子。现在,我在一个文本中发现了大约 4 种类型的子字符串模式。我正在尝试捕获这些多个开始和结束子字符串之间的字符串。我能够正确捕获第一个字符串出现,但不能正确捕获第二个、第三个……等。

开始和结束字符串:FOO、BARS、BAR、BAR2

text = 'I do not want this FOO string1 BARS I do not want this FOO string 2 BAR I do not want this FOO string3 BAR2 I do not want this FOO string4 BARS '


snippet1 = text[text.index('FOO')+len('FOO'):text.index('BARS')] \
            if text[text.index('FOO')+len('FOO'):text.index('BARS')] else ''

snippet2 = text[text.index('FOO')+len('FOO'):text.index('BAR')] \
            if text[text.index('FOO')+len('FOO'):text.index('BAR')] else ''

snippet3 = text[text.index('FOO')+len('FOO'):text.index('BAR2')] \
            if text[text.index('FOO')+len('FOO'):text.index('BAR2')] else ''

# print(type(snippet1))
print('')
print('snippet1:',snippet1) #Output: snippet1:  string1
print('')
print('snippet2',snippet2) # Output: snippet2  string1
print('')
print('snippet3',snippet3) # Output: snippet3  string1 BARS I do not want this FOO string2 BAR I do not want this FOO string3

# How do I get this output? Is it possible to code this?
snippet1:  string1
snippet2:  string2
snippet3:  string3

标签: pythonregexstringpython-3.x

解决方案


IIUC:您可以使用以下方法执行此操作regex

import re
txt='I do not want this FOO string1 BARS I do not want this FOO string 2 BAR I do not want this FOO string3 BAR2 I do not want this FOO string4 BARS '
re.findall('FOO(.*?)BAR', txt)

将生成匹配字符串列表,如下所示:

[' string1 ', ' string 2 ', ' string3 ', ' string4 ']

更新匹配多个关键字:

import re
txt='I do not want this FOO string1 BARS I do not want this FOO string 2 SECTION I do not want this FOO string3 BAR2 I do not want this FOO string4 BARS'
re.findall('FOO(.*?)[BAR|SECTION]', txt)

将导致:

[' string1 ', ' string 2 ', ' string3 ', ' string4 ']

推荐阅读