首页 > 解决方案 > 如何读取几个日志文件并按模式将它们分成几行

问题描述

我有一个目录,其中的日志文件都以 *.log 结尾是否可以读取所有文件制作一个大文件并在查找“日期”时拆分行日志文件看起来像这样:

2019-04-15 21:58:07 bla bla bla
2019-04-15 21:58:08 bla bla bla bla
2019-04-15 21:58:09 bla bla bla
test1
test2
test3
2019-04-15 21:59:02 bla bla
2019-04-15 21:59:05 bla bla bla
test
now
go

现在我想在查找日期时将这个文件分成几行,这样它就像:

2019-04-15 21:58:07 bla bla bla
2019-04-15 21:58:08 bla bla bla bla
2019-04-15 21:58:09 bla bla bla test1 test2 test3
2019-04-15 21:59:02 bla bla
2019-04-15 21:59:05 bla bla bla test now go

有人可以帮我吗?

亲切的问候

标签: pythonpython-3.x

解决方案


它不漂亮,它可能会更有效,但这有效

import os, re

# change this to be wherever you keep all those log files
work_dir = '/home/ubuntu/workspace/bin/tmp'

# load the full path for all files in the work_dir (I'm not checking if file is a .log file)
logs = [os.path.join(work_dir, file) for file in os.listdir(work_dir) if os.path.isfile(os.path.join(work_dir, file))]


def process_list(in_list):
    date_patt = r'\d{4}-\d{2}-\d{2}[\s]+\d{2}:\d{2}:\d{2}'
    last_good_idx = 0
    for idx in range(len(in_list)):
        if re.search(date_patt, in_list[idx]):
            last_good_idx = idx
        else:
            in_list[last_good_idx] += f' {in_list[idx].strip()}'

    return in_list

def clean_list(in_list):
    date_patt = r'\d{4}-\d{2}-\d{2}[\s]+\d{2}:\d{2}:\d{2}'
    for elem in in_list[:]:
        if not re.search(date_patt, elem):
            in_list.remove(elem)
    return in_list

# write master log to working directory file called master.log
with open(os.path.join(work_dir, 'master.log'), 'w') as out:
    for file in logs:
        with open(file, 'r') as f:
            file_text = f.read()
            text_list = file_text.split('\n')
            text_list = process_list(text_list)
            text_list = clean_list(text_list)

            for line in text_list:
                out.write(line + '\n')

如果您只想使用以 .log 结尾的文件,请将其添加到分配logs变量的列表推导中。

process_list将与正则表达式不匹配的移动行处理date_patt到在匹配的最后一个索引处找到的字符串的末尾date_patt

clean_list从输入列表中删除任何与date_patt.


推荐阅读