首页 > 解决方案 > Java 中需要 REGEX 来提取所有带有描述的 WARN 消息可能是也可能不是多行消息

问题描述

我正在尝试为输入文本编写一个正则表达式,我必须在其中提取所有带有消息的 WARN 代码。一般来说,WARN 可能是也可能不是多行的,如下所示。

[C] L1250 WARN  k2 bw34 Flex - Sockets:<16>, ThreadsPerCore:<1>
[C] L1250 WARN  For abcd (analytical and transactional workloads). For 12s Systems and above, should be
                disabled.
[C] L1250 INFO  For abcd (analytical workloads), Hyperthreading should be enabled , 8s, 12s, 14d, 34t
                d above.
[C] L1250 WARN  Intel's Hyperthreading on 18+ Socket system disabled. Should be disabled urgently
                fix it!
[C] L1300 OK    CPU governors set as recommended
[C] L1250 WARN  Intel's Hyperthreading on 8+ Socket system disabled.

最初,我从正则表达式开始:(WARN).*(\b|\B),这会捕获到单词/非单词边界的结尾,它不会捕获以下多行(继续 WARN 描述)。

然后我尝试了-> WARN.+([\S\s]*?)+(?=\[C\]) 但这没有捕获最后一个 WARN 行,因为没有进一步的 [C] 标记。

在此处输入图像描述

标签: javaregexregex-lookaroundsmultilineregex-greedy

解决方案


您可以在不使用[\s\S]*或单行选项的情况下通过匹配所有不以开头的行来获得匹配[C]

\bWARN\h+.*(?:\R(?!\[C]).*)*

解释

  • \bWARN在单词边界之前匹配 WARN 以防止成为更大单词的一部分
  • \h+.*匹配 1+ 个水平空白字符
  • (?:非捕获组
    • \R(?!\[C]).*匹配 unicode 换行序列,断言字符串不以[C]
  • )*关闭组并重复 0+ 次

正则表达式演示| Java 演示

例如:

String regex = "\\bWARN\\h+.*(?:\\R(?!\\[C]).*)*";
String string = "[C] L1250 WARN  k2 bw34 Flex - Sockets:<16>, ThreadsPerCore:<1>\n"
     + "[C] L1250 WARN  For abcd (analytical and transactional workloads). For 12s Systems and above, should be\n"
     + "                disabled.\n"
     + "[C] L1250 INFO  For abcd (analytical workloads), Hyperthreading should be enabled , 8s, 12s, 14d, 34t\n"
     + "                d above.\n"
     + "[C] L1250 WARN  Intel's Hyperthreading on 18+ Socket system disabled. Should be disabled urgently\n"
     + "                fix it!\n"
     + "[C] L1300 OK    CPU governors set as recommended\n"
     + "[C] L1250 WARN  Intel's Hyperthreading on 8+ Socket system disabled.";

Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);

while (matcher.find()) {
    System.out.println(matcher.group(0));
}

输出

WARN  k2 bw34 Flex - Sockets:<16>, ThreadsPerCore:<1>
WARN  For abcd (analytical and transactional workloads). For 12s Systems and above, should be
                disabled.
WARN  Intel's Hyperthreading on 18+ Socket system disabled. Should be disabled urgently
                fix it!
WARN  Intel's Hyperthreading on 8+ Socket system disabled.

如果[C]不是边界,另一个选项是检查下一行是否不包含WARNINFOOK

 \bWARN\h+.*(?:\R(?!.*\h(?:WARN|INFO|OK)\h).*)*

正则表达式演示

在 Java 中

String regex = "\\bWARN\\h+.*(?:\\R(?!.*\\h(?:WARN|INFO|OK)\\h).*)*";

推荐阅读