python - 如何在Python中获取正则表达式后剩余字符串索引的索引?
问题描述
我已经在 python 中使用正则表达式得到了匹配的字符串,如下所示。
import re
matches = re.finditer(r'<\S+?>',' Hi <a> This is </a> an example! ')
for match in matches:
print(
"matched string: '%s', start index: %s, end index: %s"
% (match.group(0), match.span(0)[0], match.span(0)[1])
)
导致:
matched string: '<a>', start index: 4, end index: 7
matched string: '</a>', start index: 16, end index: 20
现在我想获得剩余的字符串索引,例如:
[0,4],[7,16],[20,33]
解决方案
像这样的东西应该会给你预期的输出:
import re
str = ' Hi <a> This is </a> an example! '
matches = re.finditer(r'<\S+?>',str)
start = 0
output = []
for match in matches:
output.append([start,match.start()])
start = match.end()
output.append([start,len(str)])
print(output)
输出:
[[0, 4], [7, 16], [20, 33]]