首页 > 解决方案 > 如何从任何行读取

问题描述

有一个包含文件的目录:

ab_list
bd_list
cd_list
mno_list
hk_list 
pd_list

我在此目录之外还有另一个名为 testfile 的文件:

abc
que nw

ab_list   ON   8
gs_list   ON   9
hk_list   OFF  9
bd_list   ON   7
cd_list   OFF  6
fr_list   ON   5
mno_list  ON   4
pq_list   OFF   6
jk_list   ON   7
pd_list   OFF  8

我想比较 2 和所有带有文件名和 ON 的文件(如果匹配),它们的内容应该合并到一个名为 merge_file 的新文件中。与 testfile 匹配但为 OFF 的其他文件,其文件名应打印在 new_file 中。

假设目录名称是Folder,并且在该目录中是另一个名为 的目录folder,则此代码就是这样做的:

    from glob import glob

    test_file_directory = "C:\\Users\\User\\Desktop\\Folder\\"

    files1 = glob("*.txt")
    with open(test_file_directory+"testfile.txt","r") as f:
        files2 = [' '.join([l.split()[0],l.split()[1]]) for l in f.readlines()[3:]]

    for f1 in files1:
        for f2 in files2:
            if f1[:-4]+'   ON' == f2:
                #print('match')
                with open('merge_file.txt','a') as a:
                    with open(f1,'r') as r:
                        a.write(r.read()+'\n')
            elif f1[:-4]+'   OFF' == f2:
                #print('match')
                with open('match_file.txt','a') as a:
                    with open(f1,'r') as r:
                        a.write(f"{f2} {len(r.readlines())}\n")

在这里,这段代码从第 4 行读取文件的行files2 = [' '.join([l.split()[0],l.split()[1]]) for l in f.readlines()[3:],但是现在我希望这段代码能够普遍适用于它可以从 4th 5th 1st 或任何行读取的所有类似类型的文件。当我删除它时,[3:]它给出了一个错误提示 files2 = [' '.join([l.split()[0],l.split()[1]]) for l in f.readlines()[3:]] index error: list out of range。有人可以帮我吗?

标签: pythonpython-3.xmerge

解决方案


如果您能提供帮助,最好将您的文件存储testfile.txt为适合此任务的另一种文件格式,例如csv

但是,我已经写了这门课,它应该做你所追求的;您可能需要添加额外的条件检查,例如,它不关心merge_listor是否new_file已经存在,它只会附加到它们。

import re
import os
class Merge:
    def __init__(self, merge_list_file, merge_dir, merge_file, unmerged_file):
        self.merge_list_file = merge_list_file
        self.merge_dir = merge_dir
        self.merge_file = merge_file
        self.unmerged_file = unmerged_file
        self.__parse_list()

    def __parse_list(self):
        pattern = re.compile(r"^(.+?)\s(ON|OFF)\s+\d+$")
        with open(self.merge_list_file, 'r') as ml:
            content = ml.readlines()
            self.merge_list = [{'file': match.group(1), 'merge': (True if match.group(2) == 'ON' else False)} for line in content for match in [pattern.search(line)] if match]
        return self.merge_list

    def merge(self):
        merge_file = open(self.merge_file, 'a')
        unmerge_file = open(self.unmerged_file, 'a')
        for i in (self.merge_list):
            file = i['file']
            merge = i['merge']
            if not merge:
                unmerge_file.write(file + "\n")
                continue
            to_merge = os.path.join(self.merge_dir, file)
            if not os.path.exists(to_merge):
                continue
            with open(to_merge) as f:
                merge_file.write(f.read())
        merge_file.close()
        unmerge_file.close()

你会像这样使用它

if __name__ == '__main__':
    Merge('testfile.txt', 'Folder', 'merge_file.txt', 'new_file.txt').merge()

我的结构在哪里

C:.
|   merge.py
|   testfile.txt
|
\---Folder
        A
        B
        C

导致merge_file.txt包含ABAB内容附加在一起)和new_file.txt包含C('OFF' 文件每个都附加在新行上)。


推荐阅读