首页 > 解决方案 > 用于合并和整合日志文件的 python 脚本

问题描述

我有 2 个不同格式的日志文件(sample_1.log 和 sample_2.log)。两个日志具有相同的属性,例如:

sample_1.log具有类似 JSON 的模式,如下所示:

2020-07-22 07:22:37.822863: { "level": "Exception", "message": "Error4", "user": "oracle", "id": "3521" }  
2020-09-22 12:31:44.789319: { "level": "Exception", "message": "Error1", "user": "administrator", "id": "4371" }  
2021-04-06 22:51:10.999642: { "level": "ERROR", "message": "Error3", "user": "azureuser", "id": "2294" }  
2020-07-19 15:45:58.576940: { "level": "Exception", "message": "Error2", "user": "hacker40", "id": "8677" }  
2021-01-23 14:07:18.922480: { "level": "ERROR", "message": "Error5", "user": "hacker40", "id": "1865" }  
2020-08-19 05:47:46.983299: { "level": "Exception", "message": "Error4", "user": "pi", "id": "8993" }  
2021-03-25 13:13:06.012237: { "level": "ERROR", "message": "Error5", "user": "mysql", "id": "3561" }  
2020-05-05 06:37:50.976402: { "level": "ERROR", "message": "Error5", "user": "pi", "id": "1754" } 

sample_2.log具有更通用的格式,如下所示:

ERROR at 19:30:13 on Thu, 02/07/2020  - Error1 - tracking id is 4126 - user is puppet.  
ERROR at 12:08:30 on Mon, 26/10/2020  - Error1 - tracking id is 5567 - user is ftp.  
Exception at 21:35:12 on Sun, 28/06/2020  - Error5 - tracking id is 8077 - user is vagrant.  
ERROR at 06:36:05 on Sat, 11/07/2020  - Error1 - tracking id is 5218 - user is puppet.  
Exception at 17:40:33 on Fri, 01/01/2021  - Error3 - tracking id is 8252 - user is mysql.  
Exception at 13:49:18 on Wed, 06/05/2020  - Error2 - tracking id is 8369 - user is hacker15.  
Exception at 21:09:05 on Sat, 16/05/2020  - Error1 - tracking id is 5091 - user is adm.  
ERROR at 10:39:13 on Thu, 29/04/2021  - Error2 - tracking id is 4225 - user is oracle.  

是否可以编写一个 python 脚本来抓取两个日志文件,将它们合并为相同的格式并按日期对事件进行排序?

合并脚本应该有两个导入参数和一个输出参数:

consolidate.py sample_1.log sample_2.log output.log

输出格式可以像 sample_2.log

我感谢每一个建议

Edit

有人帮我将 sample_1.log 的内容转换为与sample_2.log相同的格式


import re
from datetime import datetime, date

file = open("input1.log","r")
log = file.read().splitlines()
file.close()

for e in log:
    '''                  2020-06-24 19:07:54.153862: { "level": "Exception", "message": "Error4", "user": "oracle", "id": "9293" }'''
    pattern = r'(\d{4}-\d{2}-\d{2}) (\d\d:\d\d:\d\d.\d+): { "level": "(.+)", "message": "(.+)", "user": "(.+)", "id": "(.+)" }'
    match = re.search(pattern, e)

    DAY_OF_WEEK = {
        0: "Mon",
        1: "Tue",
        2: "Wed",
        3: "Thu",
        4: "Fri",
        5: "Sat",
        6: "Sun",
    }

    date_ = match.group(1)
    time = match.group(2)
    level = match.group(3)
    message = match.group(4)
    user = match.group(5)
    id = match.group(6)

    day = date.fromisoformat(date_).weekday()
    day = DAY_OF_WEEK[day]

    time = time.split('.')[0]

    dt = re.sub(r'-', '/', date_)

    newFormat = f'{level} at {time} on {day}, {dt}  - {message} - tracking id is {id} - user is {user}'
    print(newFormat)

如何改进脚本或将上面的 python 脚本的输出附加到 sample_2.log 并按日期排序?

希望现在有人可以帮助我

标签: pythonlogging

解决方案


推荐阅读