首页 > 解决方案 > Fluentd 在写入 Elasticsearch 之前未按预期过滤

问题描述

使用:

我有一个带有配置的 Logback 的 Springboot 工件,除了应用程序 STDOUT 之外,它还向 Fluentd 发送日志:

<appender name="FLUENT_TEXT"
          class="ch.qos.logback.more.appenders.DataFluentAppender">
    <filter class="ch.qos.logback.classic.filter.ThresholdFilter">
        <level>INFO</level>
    </filter>
    
    <tag>myapp</tag>
    <label>myservicename</label>
    <remoteHost>fluentdservicename</remoteHost>
    <port>24224</port>
    <useEventTime>false</useEventTime>
</appender>

Fluentd 配置文件如下所示:

<ROOT>
  <source>
    @type forward
    port 24224
    bind "0.0.0.0"
  </source>

  <filter myapp.**>
    @type parser
    key_name "message"
    reserve_data true
    remove_key_name_field false
    <parse>
      @type "json"
    </parse>
  </filter>

  <match myapp.**>
    @type copy
    <store>
      @type "elasticsearch"
      host "elasticdb"
      port 9200
      logstash_format true
      logstash_prefix "applogs"
      logstash_dateformat "%Y%m%d"
      include_tag_key true
      type_name "app_log"
      tag_key "@log_name"
      flush_interval 1s
      user "elastic"
      password xxxxxx
      <buffer>
        flush_interval 1s
      </buffer>
    </store>
    <store>
      @type "stdout"
    </store>
  </match>
</ROOT>

所以它只是添加了一个过滤器来将信息(一个 Json 字符串)解析为结构化的方式,然后将其写入 Elasticsearch(以及 Fluentd 的 STDOUT)。检查我如何添加 myapp.** 正则表达式以使其在过滤器和匹配块中匹配。

Everyting 在 Openshift 中正常运行。Springboot 将日志正确发送到 Fluentd,Fluentd 写入 Elasticsearch。

但问题是从应用程序生成的每个日志也是写入的。这意味着每个 INFO 日志(例如初始 Spring 配置或应用程序通过 Logback 发送到的任何其他信息)也会被写入。

“通缉”日志示例:

2020-11-04 06:33:42.312840352 +0000 myapp.myservice: {"traceId":"bf8195d9-16dd-4e58-a0aa-413d89a1eca9","spanId":"f597f7ffbe722fa7","spanExportable":"false","X-Span-Export":"false","level":"INFO","X-B3-SpanId":"f597f7ffbe722fa7","idOrq":"bf8195d9-16dd-4e58-a0aa-413d89a1eca9","logger":"es.organization.project.myapp.commons.services.impl.LoggerServiceImpl","X-B3-TraceId":"f597f7ffbe722fa7","thread":"http-nio-8085-exec-1","message":"{\"traceId\":\"bf8195d9-16dd-4e58-a0aa-413d89a1eca9\",\"inout\":\"IN\",\"startTime\":1604471622281,\"finishTime\":null,\"executionTime\":null,\"entrySize\":5494.0,\"exitSize\":null,\"differenceSize\":null,\"user\":\"pmmartin\",\"methodPath\":\"Method Path\",\"errorMessage\":null,\"className\":\"CamelOrchestrator\",\"methodName\":\"preauthorization_validate\"}","idOp":"","inout":"IN","startTime":1604471622281,"finishTime":null,"executionTime":null,"entrySize":5494.0,"exitSize":null,"differenceSize":null,"user":"pmmartin","methodPath":"Method Path","errorMessage":null,"className":"CamelOrchestrator","methodName":"preauthorization_validate"}

“不需要的”日志示例(检查每个意外日志消息如何出现 Fluentd 警告):

2020-11-04 06:55:09.000000000 +0000 myapp.myservice: {"level":"INFO","logger":"org.apache.camel.impl.engine.InternalRouteStartupManager","thread":"restartedMain","message":"Route: route6 started and consuming from: servlet:/preAuth"}
2020-11-04 06:55:09 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'Total 20 routes, of which 20 are started'" location=nil tag="myapp.myservice" time=1604472909 record={"level"=>"INFO", "logger"=>"org.apache.camel.impl.engine.AbstractCamelContext", "thread"=>"restartedMain", "message"=>"Total 20 routes, of which 20 are started"}
2020-11-04 06:55:09.000000000 +0000 myapp.myservice: {"level":"INFO","logger":"org.apache.camel.impl.engine.AbstractCamelContext","thread":"restartedMain","message":"Total 20 routes, of which 20 are started"}
2020-11-04 06:55:09 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'Apache Camel 3.5.0 (MyService DEMO Mode) started in 0.036 seconds'" location=nil tag="myapp.myservice" time=1604472909 record={"level"=>"INFO", "logger"=>"org.apache.camel.impl.engine.AbstractCamelContext", "thread"=>"restartedMain", "message"=>"Apache Camel 3.5.0 (MyService DEMO Mode) started in 0.036 seconds"}
2020-11-04 06:55:09.000000000 +0000 myapp.myservice: {"level":"INFO","logger":"org.apache.camel.impl.engine.AbstractCamelContext","thread":"restartedMain","message":"Apache Camel 3.5.0 (MyService DEMO Mode) started in 0.036 seconds"}
2020-11-04 06:55:09 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'Started MyServiceApplication in 15.446 seconds (JVM running for 346.061)'" location=nil tag="myapp.myservice" time=1604472909 record={"level"=>"INFO", "logger"=>"es.organization.project.myapp.MyService", "thread"=>"restartedMain", "message"=>"Started MyService in 15.446 seconds (JVM running for 346.061)"}

问题是:我如何以及如何告诉 Fluentd 真正过滤获取的信息,以便丢弃不需要的信息?

标签: spring-bootelasticsearchlogbackfluentd

解决方案


感谢@Azeem,根据grepregexp功能文档,我明白了:)。

我刚刚将它添加到我的 Fluentd 配置文件中:

<filter onpay.**>
  @type grep
  <regexp>
    key message
    pattern /^.*inout.*$/
  </regexp>
</filter>

现在排除任何不包含单词“inout”的行。


推荐阅读