首页 > 解决方案 > 如何剪切部分文本并用 Python 和 RegEx 替换每一行

问题描述

您好,我是 Python 的初学者,刚刚开始学习它并使用 RegEx 进行文本操作。如果我违反了 StackOverflow 的一些规则,我很抱歉

我正在用 Python 制作一个脚本,我将从第一行获取(剪切)日期和时间,并在每一行上替换“日期”“TimeWindowStart”和“TimeWindowEnd”

ReportDate=03/24/2019, TimeWindowStart=18:00:00, TimeWindowEnd=20:59:59

Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000

我知道如何使用正则表达式日期进行选择

([0-9][0-9]|2[0-9])/[0-9][0-9](/[0-9][0-9][0-9][0-9])?

以及如何选择时间

([0-9][0-9]|2[0-9]):[0-9][0-9](:[0-9][0-9])?

但我坚持如何选择部分文本复制它然后找到我想用re.sub 函数替换的文本

所以最终输出看起来像这样:

ReportDate=, TimeWindowStart=, TimeWindowEnd=

03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000 
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000 
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000 
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000

标签: pythonregex

解决方案


这是我的代码:

import re

s = """ReportDate=03/24/2019, TimeWindowStart=18:00:00, TimeWindowEnd=20:59:59

Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000"""

datereg = r'(\d{2}/\d{2}/\d{4})'
timereg = r'(\d{2}:\d{2}:\d{2})'

dates = re.findall(datereg, s)
times = re.findall(timereg, s)

# replacing one thing at a time
result = re.sub(r'\bDate\b', dates[0],
            re.sub(r'\bTimeWindowEnd\b,', times[1] + ',',
                re.sub(r'\bTimeWindowStart\b,', times[0] + ',',
                    re.sub(timereg, '', 
                        re.sub(datereg, '', s)))))

print(result)

输出:

ReportDate=, TimeWindowStart=, TimeWindowEnd=

03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000

推荐阅读