首页 > 解决方案 > python从txt文件中删除某些动态行

问题描述

我有几个 txt 文件,其数据行结构如下:

文件 1

Header1, xx, yy
Redundant line 1
Redundant line 2
Redundant line 3
Header2, #012345 (random numbers)
data content (to the end of file)

文件2

Header1, xx, yy
Redundant line 1
Redundant line 2
Redundant line 3
Redundant line 4
Header2, #67891 (random numbers)
data content (to the end of file)

文件 3

Header1, xx, yy
Redundant line 1
Redundant line 2
Header2, #54321 (random numbers)
data content (to the end of file)

预期输出:

对于每个文件,我想删除那些冗余行,只保留 Header1、Header2、#zzzzz 编号的行以及带有数据内容的以下行到文件末尾,并保存到一个新的单个文件,因此每个新文件具有以下数据结构:

Header1, xx, yy
Header2, #zzzzz (keep random numbers from original file)
data content (to the end of file)

我的代码:

我的代码不适用于具有动态冗余行的每个文件,有人可以提供一些建议,谢谢!

with open('File1.txt') as old, open('new_file1.txt', 'w') as new:
    lines = old.readlines()
    new.writelines(lines[0:1]) #keep Header1
    new.writelines(lines[N:]) #keep Header2 and following data content to the end

标签: python

解决方案


您可以N使用初始值定义变量1,并不断增加它,1直到一行与正则表达式匹配.*?,#\d+ (对于第二个标题)

import re
with open('File1.txt') as old, open('new_file1.txt', 'w') as new:
    lines = old.readlines()
    new.writelines(lines[:1]) #keep Header1
    N = 1
    while True:
        N += 1
        if re.match(".*?,#\d+", lines[N]):
            break
    new.writelines(lines[N:]) #keep Header2 and following data content to the end

输入文件File1.txt

Header1, xx, yy
Redundant line 1
Redundant line 2
Redundant line 3
Header2, #012345 (random numbers)
data content (to the end of file)

输出文件new_file1.txt

Header1, xx, yy
Header2, #012345 (random numbers)
data content (to the end of file)

推荐阅读