regex - 使用正则表达式模式从日志文件中解析数据

问题描述

我有一个充满这种类型日志的日志文件：

2020-02-04 04:00:31,503 [z4y6480f-214b-4253-9223-n02542f706ac] [INFO] [ServiceType] [ObjectType] - Information about the log

我想使用正则表达式模式来检索时间、括号中的最后一个文本（示例中的 [ObjectType]）和连字符后的信息消息。

输入示例：

2020-02-04 04:00:33,435 [z4y6480f-214b-4253-9223-n02542f706ac] [INFO] [ServiceTypeJohn] [ObjectTypeJohn] - Information about the John log
2020-02-04 06:50:34,465 [z4y6480f-214b-4253-9223-n02542f706ac] [INFO] [ServiceTypeBob] [ObjectTypeBob] - Information about the Bob log
2020-02-04 07:20:34,677 [z4y6480f-214b-4253-9223-n02542f706ac] [INFO] [ServiceTypeSam] [ObjectTypeSam] - Information about the Sam log

期望的输出：

04:00:33,435 [ObjectTypeJohn] - Information about the John log
06:50:34,465 [ObjectTypeBob] - Information about the Bob log
07:20:34,677 [ObjectTypeSam] - Information about the Sam log

到目前为止，我已经尝试过但没有成功：

(Get-Content Output.txt) -replace '^(\d\d:\d\d:\d\d).*(\[.*?\] - .*?)$','$1;$2'

将不胜感激任何帮助，谢谢。

标签： regexpowershell

解决方案

您可以使用

(Get-Content Output.txt) -replace '^\S+\s+(\S+).*(\[[^][]*])\s*(-.*)', '$1 $2 $3'

请参阅.NET 正则表达式演示

细节

^- 字符串的开头
\S+- 除空格外的 1+ 个字符
\s+- 1+ 空格
(\S+)- 第 1 组：1+ 字符而不是空格
.*- 除换行符以外的任何 0+ 个字符，尽可能多
(\[[^][]*])- 第 2 组：[, 0+ 个字符，除了[and]和一个]字符
\s*- 1+ 空格
(-.*)- 第 3 组：-以及字符串的其余部分。

演示结果：

regex - 使用正则表达式模式从日志文件中解析数据

问题描述

解决方案

推荐阅读