首页 > 解决方案 > 正则表达式处理 rubular fluentd 中的所有多行异常

问题描述

我设计了正则表达式以匹配 fluentd 解析器的所有多行异常或警告消息字段,格式如下

(SLF4J:\s.*|[a-zA-z_]*\..*\.*\s.*\s.*|Caused\sby:\s|\s+at\s.*|\s+\.\.\. (\d)+ more)

它匹配不必要的字段。

我想匹配所有异常或警告多行的开始。 简而言之:最新的多行将从文件的开头读取 unitl 它会得到下一行,因为 JSON.JSON 总是以 {" togather 开头。当我们看到带有 {" 的行时,我们将停止读取多行

两种情况下的一个正则表达式或两种情况下的 2 个正则表达式都可以

演示链接

正则表达式位于:https ://rubular.com/r/O26Wm6mc7z51re

正则表达式位于:https ://rubular.com/r/v6Q7iwZqmNDAAx

测试字符串是:

java.lang.InterruptedException: Timeout while waiting for epoch from quorum
        at org.apache.zookeeper.server.quorum.Leader.getEpochToPropose(Leader.java:1227)
        at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:482)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1284)
        ... 19 more
{"log_timestamp": "2021-02-18T11:33:23.114+0000", "log_level": "WARN", "process_id": "zookeeper#2", "process_name": "zookeeper", "thread_id": 1, "thread_name": "QuorumPeer[myid=2](plain=/0.0.0.0:2181)(secure=disabled)", "action_name": "org.apache.zookeeper.server.quorum.QuorumPeer", "log_message": "PeerState set to LOOKING"}
{"log_timestamp": "2021-02-18T11:33:23.115+0000", "log_level": "WARN", "process_id": "zookeeper#2", "process_name": "zookeeper", "thread_id": 1, "thread_name": "WorkerSender[myid=2]", "action_name": "org.apache.zookeeper.server.quorum.QuorumPeer", "log_message": "Failed to resolve address: zk-2.zk-headless.intam.svc.cluster.local"}
java.net.UnknownHostException: zk-2.zk-headless.intam.svc.cluster.local
        at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
        at java.net.InetAddress.getAllByName(InetAddress.java:1193)
        at java.net.InetAddress.getAllByName(InetAddress.java:1127)
        at java.net.InetAddress.getByName(InetAddress.java:1077)
        at org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:194)
        at org.apache.zookeeper.server.quorum.QuorumPeer.recreateSocketAddresses(QuorumPeer.java:764)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:699)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:618)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:477)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:456)
        at java.lang.Thread.run(Thread.java:748)
{"log_timestamp": "2021-02-18T11:33:23.115+0000", "log_level": "WARN", "process_id": "zookeeper#2", "process_name": "zookeeper", "thread_id": 1, "thread_name": "WorkerSender[myid=2]", "action_name": "org.apache.zookeeper.server.quorum.QuorumPeer", "log_message": "Failed to resolve address: zk-2.zk-headless.sxc.svc.cluster.local"}

预期匹配:对于演示 1: https : //rubular.com/r/O26Wm6mc7z51re

java.lang.InterruptedException: Timeout while waiting for epoch from quorum
        at org.apache.zookeeper.server.quorum.Leader.getEpochToPropose(Leader.java:1227)
        at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:482)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1284)
        ... 19 more

对于演示 2:https ://rubular.com/r/v6Q7iwZqmNDAAx

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/spark/jars/logback-classic-1.2.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/spark/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type 

标签: javaregexregex-lookaroundsregex-negationrubular

解决方案


您可能会使用具有捕获组和反向引用的单个模式来获取这两个部分

^(SLF4J:|java\.lang\.InterruptedException:).*(?:\R(?!\1|{).*)*

模式匹配:

  • ^字符串的开始
  • (SLF4J:|java\.lang\.InterruptedException).*在匹配任一备选方案的第 1 组中捕获
  • (?:非捕获组
    • \R(?!\1|{).*匹配换行符并断言该字符串不以 wat 开头,或者在组 1 中捕获或{
  • )*关闭组并可选择重复以匹配所有行

正则表达式演示

参见第一部分第二部分的红字匹配。

请注意,在 Java 中将反斜杠加倍

String regex = "^(SLF4J:|java\\.lang\\.InterruptedException:).*(?:\\R(?!\\1|\\{).*)*";

不跨越 SLF4J 或在字符串开头表示为点分隔字符串的不同类型的异常:

^(?:SLF4J:|\w+(?:\.\w+)+).*(?:\R(?!(?:SLF4J:|\w+(?:\.\w+)+)|{).*)*

正则表达式演示


推荐阅读