首页 > 解决方案 > 如何在一定时间范围内从文本中提取

问题描述

我在下面有一段文字,如何提取时间范围之间的文字。代码可用于提取所有值

s = '''00:00:14,099 --> 00:00:19,100
a classic math problem a

00:00:17,039 --> 00:00:28,470
will come from an unexpected place

00:00:18,039 --> 00:00:19,470

00:00:20,039 --> 00:00:21,470

00:00:22,100 --> 00:00:30,119
binary numbers first I'm going to give

00:00:30,119 --> 00:00:35,430
puzzle and then you can try to solve it

00:00:32,489 --> 00:00:37,170
like I said you have a thousand bottles'''

我可以从00:00:17,039 --> 00:00:28,470和中提取测试吗00:00:30,119

写回所有值的代码

import re
lines = s.split('\n')
dict = {}

for line in lines:
    is_key_match_obj = re.search('([\d\:\,]{12})(\s-->\s)([\d\:\,]{12})', line)
    if is_key_match_obj:
        #current_key = is_key_match_obj.group()
        print (current_key)
        continue

    if current_key:
        if current_key in dict:
            if not line:
                dict[current_key] += '\n'
            else:
                dict[current_key] += line
        else:
              dict[current_key] = line

print(dict.values())

预计从00:00:17,039 --> 00:00:28,47000:00:30,119 --> 00:00:35,430

dict_values(['will come from an unexpected place ', '', '', 'binary numbers first I'm going to give', ' puzzle and then you can try to solve it'])

标签: pythonregex

解决方案


无需逐行迭代。试试下面的代码。它会给你一本你想要的字典。

import re
dict = dict(re.findall('(\d{2}:\d{2}.*)\n(.*)', s))
print(dict.values())

输出

dict_values(['a classic math problem a', 'will come from an unexpected place', '', '', "binary numbers first I'm going to give", 'puzzle and then you can try to solve it', 'like I said you have a thousand bottles'])

推荐阅读