首页 > 解决方案 > 正则表达式模式只匹配到一半

问题描述

我正在尝试以可以提取时间码之间的文本的方式匹配以下数据。

subs='''

1
00:00:00,130 --> 00:00:01,640

where you there when it 
happened?

Who else saw you?

2
00:00:01,640 --> 00:00:03,414


This might be your last chance to


come clean. Take it or leave it.
'''

Regex=re.compile(r'(\d\d:\d\d\:\d\d,\d\d\d) --> (\d\d:\d\d\:\d\d,\d\d\d)(\n.+)((\n)?).+')

我的正则表达式匹配第一行时间码和第一行文本,但只从第二行返回几个字符,而不是整个第二行。我怎样才能让它匹配出时间码和实时码之间的所有内容?

标签: pythonregex

解决方案


我不确定,但我认为下面的解决方案更适合您的情况...
※使用下面的解决方案,您不仅可以提取时间码之间的文本,还可以将文本连接到时间-代码。

import re

multiline_text=\
"""

1 00:00:00,130 --> 00:00:01,640

where you there when it happened?

Who else saw you?

2 00:00:01,640 --> 00:00:03,414

This might be your last chance to

come clean. Take it or leave it.
"""

lines = multiline_text.split('\n')
dict = {}
current_key = None;

for line in lines:
  is_key_match_obj = re.search('([\d\:\,]{12})(\s-->\s)([\d\:\,]{12})', line)
  if is_key_match_obj:
    current_key = is_key_match_obj.group()
    continue

  if current_key:
    if current_key in dict:
      if not line:
        dict[current_key] += '\n'
      else:
        dict[current_key] += line
    else:
      dict[current_key] = line

print(dict)

推荐阅读