首页 > 解决方案 > Python re.split 和 re.findall:分组和捕获

问题描述

我有类似"00:00:00 Segment 1 00:20:00 Segment 2 8:00:00 Segment 3"and的字符串,"00:00 Segment 1 20:0 Segment 2"并且想要使用re.split()re.findall()查找所有时间戳和段名称。但是我无法在没有捕获效果的情况下实现可选组。这是我得到的:

str_1 = "00:00:00 Segment 1 00:20:00 Segment 2 8:00:00 Segment 3"
str_2 = "00:00 Segment 1 20:0 Segment 2"

re.findall(r'\d\d?:\d\d?:\d\d?', str_1)
=>  ['00:00:00', '00:20:00', '8:00:00']

re.split(r'\d\d?:\d\d?:\d\d?', str_1)
=> ['', ' Segment 1 ', ' Segment 2 ', ' Segment 3']

以上工作正常,但将无法处理str_2。如果我制作了第三对数字,它只返回可选组

re.findall(r'\d\d?:\d\d?(:\d\d?)?', str_1)
=> [':00', ':00', ':00']

re.split(r'\d\d?:\d\d?(:\d\d?)?', str_1)
=> ['', ':00', ' Segment 1 ', ':00', ' Segment 2 ', ':00', ' Segment 3']

re.findall(r'\d\d?:\d\d?(:\d\d?)?', str_2)
=> ['', '']

re.split(r'\d\d?:\d\d?(:\d\d?)?', str_2)
=> ['', None, ' Segment 1 ', None, ' Segment 2']

但是,如果我在没有捕获的情况下创建了可选组,则str_2工作正常,但结果与str_1

re.findall(r'\d\d?:\d\d?(?:\d\d?)?', str_1)
=> ['00:00', '00:20', '8:00']

re.split(r'\d\d?:\d\d?(?:\d\d?)?', str_1)
=> ['', ':00 Segment 1 ', ':00 Segment 2 ', ':00 Segment 3']

re.findall(r'\d\d?:\d\d?(?:\d\d?)?', str_2)
=> ['00:00', '20:0']

re.split(r'\d\d?:\d\d?(?:\d\d?)?', str_2)
=> ['', ' Segment 1 ', ' Segment 2']

我想找到一个在str_和上都可以正常工作的正则表达式str_2,有点具有可选组但没有捕获效果。无论如何要做到这一点?

标签: pythonregex

解决方案


看起来:您的模式中缺少 ;您需要两个,一个用于 the ?:,另一个用于您的文字:,ala:

re.findall(r'\d\d?:\d\d?(?::\d\d?)?', str_1)
=> ['00:00:00', '00:20:00', '8:00:00']

推荐阅读