regex - 使用 awk 和 sed 进行日志解析
问题描述
我有
2019-11-14T09:42:14.150Z INFO ActivityEventRecovery-1 ActivityCacheManager - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Handling activity 0082bc26-70a6-433e-a470-
2019-11-14T09:43:08.097Z INFO L2HostConfigTaskExecutor2 TransportNodeAsyncServiceImpl - FABRIC [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Calling uplinkTeamingChangeListener.onTransportNodeUpdated on TN 72f73c66-da37-11e9-8d68-005056bce6a5 revision 5
2019-11-14T09:43:08.104Z INFO L2HostConfigTaskExecutor2 Publisher - ROUTING [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Refresh mac address of Logical router port connected with VLAN LS for logical router LogicalRouter/f672164b-40cf-461f-9c8d-66fe1e7f8c19
2019-11-14T09:43:08.105Z INFO L2HostConfigTaskExecutor2 GlobalActivityRepository - - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Submitted activity 73e7a942-73d2-4967-85fa-7d9d6cc6042b in QUEUED state with dependency null exclusivity true and requestId null and time taken by dao.create is 1 ms
我想将这些日志解析为 json 对象。到目前为止,我一直在使用 python 正则表达式并将其放入字典中。
currentDict = {
"@timestamp" : regexp.group(1),
"Severity" : regexp.group(2),
"Thread" : regexp.group(3),
"Class" : regexp.group(4),
"Message-id" : regexp.group(5),
"Component" : regexp.group(6),
"Message" : regexp.group(7),
"id's" : re.findall(x[1], regexp.group(7))
}
但这种方式非常慢,即 200mb 文件需要 5-10 分钟。
我使用的 Python 正则表达式 -(\d\d\d\d-\d\d-\d\dT\d\d:\d\d:\d\d.\d\d\dZ)\s+(INFO|WARN|DEBUG|ERROR|FATAL|TRACE)\s+(.*?)\s+(.*?)\s+\-\s+(.*?)\s+(?:(\[?.*?\])?)\s(.*)
预期输出 -
{"@timestamp" : "2019-11-14T09:42:14.150Z", "Sevirity" : "INFO", "Thread" : "ActivityEventRecovery-1", "Class" : "ActivityCacheManager - -", "Component" : "[nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"]", "Message" : "Handling activity 0082bc26-70a6-433e-a470-"}
{"@timestamp" : "2019-11-14T09:43:08.097Z", "Sevirity" : "INFO", "Thread" : "L2HostConfigTaskExecutor2", "Class" : "TransportNodeAsyncServiceImpl - FABRIC", "Component" : "[nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"]", "Message" : "Calling uplinkTeamingChangeListener.onTransportNodeUpdated on TN 72f73c66-da37-11e9-8d68-005056bce6a5 revision 5}"}
{"@timestamp" : "2019-11-14T09:43:08.104Z", "Sevirity" : "INFO", Thread : "L2HostConfigTaskExecutor2", "Class" : "Publisher - ROUTING", "Component" : "[nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"]", Message : "Refresh mac address of Logical router port connected with VLAN LS for logical router LogicalRouter/f672164b-40cf-461f-9c8d-66fe1e7f8c19}"}
{"@timestamp" : "2019-11-14T09:43:08.105Z", "Sevirity" : "INFO", "Thread" : "L2HostConfigTaskExecutor2", "Class" : "GlobalActivityRepository", "Component" : "[nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"]", "Messages" : "Submitted activity 73e7a942-73d2-4967-85fa-7d9d6cc6042b in QUEUED state with dependency null exclusivity true and requestId null and time taken by dao.create is 1 ms"}}
在互联网上,我发现使用 awk 和 sed 可以更快地完成。我对此了解不多。如何使用awk
and进行解析sed
。
请帮忙!
解决方案
# For timestamp
cut -d " " -f 1 in > temp
sed -i -e 's/^/{"@timestamp" : "/' temp
awk 'NF{print $0 "\", "}' temp > a
# For Severity ...
# For Thread ...
# For Class
cut -d " " -f 5,6,7 in > temp
sed -i -e 's/^/"Class" : "/' temp
awk 'NF{print $0 "\", "}' temp > d
# For Component
grep -o -P '(?<=\[).*(?=\])' in > temp
sed -i -e 's/^/"Component" : \["/' temp
awk 'NF{print $0 "\"], "}' temp > e
# For Message ...
# Merge all files line by line
paste -d " " a b c d e f
我将简短地解释这个脚本的一些内容,cut 用于将单词放在两个空格之间。Sed 将文本添加到每行的开头。awk 正在将文本添加到每行的末尾。
我离开了严重性、线程和消息部分,因为它们与其他部分相同。该脚本相当简单,但如果不知道如何使用工具本身,您将无法理解它,因为您说您不了解它们。
推荐阅读
- python - 用于图像分类的自动编码器之上的 SVM:训练和测试数据的 100% 准确率
- jquery - MaterializeCSS Sidenav JQuery 不起作用
- android - 定义和实现 HIDL 接口
- html - Tailwind CSS 未构建
- github - GitHub Classroom / GitHub Actions Autograding,如何通过邮件给予正面反馈?
- angular-material - mat-table 如何将 mat-header 与 mat-cell 宽度对齐
- ios - SwiftUI:VStack中的VStack,标签被截断
- arrays - EXCEL:使用 INDEX/MATCH 创建具有多个结果的数组公式
- python - 如何阅读和理解 Python 中的“意外类型”问题?
- reactjs - React hook useState 设置不正确