首页 > 解决方案 > 如何在 python 中的 SRT 文件中操作强?

问题描述

如果我有这样的 SRT 文件:

1
00:00:00672 --> 00:00:05568
This is about

2
00:00:05664 --> 00:00:11175
whatever

3
00:00:11303 --> 00:00:16359
I don't know

4
00:00:16423 --> 00:00:20647
you don't know

但是格式有问题,因为时间戳中缺少逗号,应该是这样的:

1
00:00:00,672 --> 00:00:05,568
This is about

2
00:00:05,664 --> 00:00:11,175
whatever

3
00:00:11,303 --> 00:00:16,359
I don't know

4
00:00:16,423 --> 00:00:20,647
you don't know

我怎样才能用python修复它?谢谢。

标签: pythonstringtext

解决方案


您可以匹配格式的开头,并断言最后 3 位数字。

--> \d{2}:\d{2}:\d{2}(?=\d{3}\b)

正则表达式演示

并替换为完整匹配和逗号

r"\g<0>,"

查看Python 演示

import re
 
regex = r"--> \d{2}:\d{2}:\d{2}(?=\d{3}\b)"
 
s = ("1\n"
    "00:00:00672 --> 00:00:05568\n"
    "This is about\n\n"
    "2\n"
    "00:00:05664 --> 00:00:11175\n"
    "whatever\n\n"
    "3\n"
    "00:00:11303 --> 00:00:16359\n"
    "I don't know\n\n"
    "4\n"
    "00:00:16423 --> 00:00:20647\n"
    "you don't know")
 
result = re.sub(regex, r"\g<0>,", s)
 
if result:
    print (result)

输出

1
00:00:00672 --> 00:00:05,568
This is about

2
00:00:05664 --> 00:00:11,175
whatever

3
00:00:11303 --> 00:00:16,359
I don't know

4
00:00:16423 --> 00:00:20,647
you don't know

推荐阅读