python - 用于合并和整合日志文件的 python 脚本
问题描述
我有 2 个不同格式的日志文件(sample_1.log 和 sample_2.log)。两个日志具有相同的属性,例如:
- 时间戳 (2020-07-22 07:22:37.822863)
- 错误类别(异常或错误)
- 错误消息(错误 1、错误 2 ..)
- 用户(甲骨文,管理员...)
- 编号 (3521,2294...)
sample_1.log具有类似 JSON 的模式,如下所示:
2020-07-22 07:22:37.822863: { "level": "Exception", "message": "Error4", "user": "oracle", "id": "3521" }
2020-09-22 12:31:44.789319: { "level": "Exception", "message": "Error1", "user": "administrator", "id": "4371" }
2021-04-06 22:51:10.999642: { "level": "ERROR", "message": "Error3", "user": "azureuser", "id": "2294" }
2020-07-19 15:45:58.576940: { "level": "Exception", "message": "Error2", "user": "hacker40", "id": "8677" }
2021-01-23 14:07:18.922480: { "level": "ERROR", "message": "Error5", "user": "hacker40", "id": "1865" }
2020-08-19 05:47:46.983299: { "level": "Exception", "message": "Error4", "user": "pi", "id": "8993" }
2021-03-25 13:13:06.012237: { "level": "ERROR", "message": "Error5", "user": "mysql", "id": "3561" }
2020-05-05 06:37:50.976402: { "level": "ERROR", "message": "Error5", "user": "pi", "id": "1754" }
sample_2.log具有更通用的格式,如下所示:
ERROR at 19:30:13 on Thu, 02/07/2020 - Error1 - tracking id is 4126 - user is puppet.
ERROR at 12:08:30 on Mon, 26/10/2020 - Error1 - tracking id is 5567 - user is ftp.
Exception at 21:35:12 on Sun, 28/06/2020 - Error5 - tracking id is 8077 - user is vagrant.
ERROR at 06:36:05 on Sat, 11/07/2020 - Error1 - tracking id is 5218 - user is puppet.
Exception at 17:40:33 on Fri, 01/01/2021 - Error3 - tracking id is 8252 - user is mysql.
Exception at 13:49:18 on Wed, 06/05/2020 - Error2 - tracking id is 8369 - user is hacker15.
Exception at 21:09:05 on Sat, 16/05/2020 - Error1 - tracking id is 5091 - user is adm.
ERROR at 10:39:13 on Thu, 29/04/2021 - Error2 - tracking id is 4225 - user is oracle.
是否可以编写一个 python 脚本来抓取两个日志文件,将它们合并为相同的格式并按日期对事件进行排序?
合并脚本应该有两个导入参数和一个输出参数:
consolidate.py sample_1.log sample_2.log output.log
输出格式可以像 sample_2.log
我感谢每一个建议
Edit
有人帮我将 sample_1.log 的内容转换为与sample_2.log相同的格式
import re
from datetime import datetime, date
file = open("input1.log","r")
log = file.read().splitlines()
file.close()
for e in log:
''' 2020-06-24 19:07:54.153862: { "level": "Exception", "message": "Error4", "user": "oracle", "id": "9293" }'''
pattern = r'(\d{4}-\d{2}-\d{2}) (\d\d:\d\d:\d\d.\d+): { "level": "(.+)", "message": "(.+)", "user": "(.+)", "id": "(.+)" }'
match = re.search(pattern, e)
DAY_OF_WEEK = {
0: "Mon",
1: "Tue",
2: "Wed",
3: "Thu",
4: "Fri",
5: "Sat",
6: "Sun",
}
date_ = match.group(1)
time = match.group(2)
level = match.group(3)
message = match.group(4)
user = match.group(5)
id = match.group(6)
day = date.fromisoformat(date_).weekday()
day = DAY_OF_WEEK[day]
time = time.split('.')[0]
dt = re.sub(r'-', '/', date_)
newFormat = f'{level} at {time} on {day}, {dt} - {message} - tracking id is {id} - user is {user}'
print(newFormat)
如何改进脚本或将上面的 python 脚本的输出附加到 sample_2.log 并按日期排序?
希望现在有人可以帮助我
解决方案
推荐阅读
- android - 使用 rxjava 正确处理所有类型的改造错误
- r - 闪亮应用的 Dockerfile
- python - 使用 PyQt5 将多个 Python 文件转换为 exe
- java - 本地主机上的服务器 Tomcat v8.0 服务器无法启动 - JSF、CDI、JAVA、TOMCAT、ECLIPSE
- reactjs - 如何测试异步反应 redux 调用
- json - 使用 JQ 对多个 JSON 字段执行相同的操作
- matlab - 加速 MATLAB 代码
- internet-explorer - GetOrgChart 在 IE 中展开节点
- php - 为什么两个搜索的组合不起作用?
- python - 下采样 dask 数据帧 - 可能分层