首页 > 解决方案 > 如何在 grok regex fluentd 中定义一个字段

问题描述

我有以下 apache atlas 审计日志:

[INFO] 2020-06-29 15:14:31,732 AUDIT logJSON - {"repoType":15,"repo":"atlas","reqUser":"varun","evtTime":"2020-06-29 15:14:29.967","access":"entity-read","resource":"AtlanColumn/[]/glue/78975568964/flights/default/flightsgdelt_100m_test_partition/c_11","resType":"entity","action":"entity-read","result":1,"agent":"atlas","policy":6,"enforcer":"ranger-acl","cliIP":"10.9.2.76","agentHost":"atlas-7d9dcdd6c5-lmfzj","logType":"RangerAudit","id":"87c9e862-910b-4ee2-86f8-cb174f4e7b76-863129","seq_num":1701441,"event_count":1,"event_dur_ms":0,"tags":[],"cluster_name":"","policy_version":54}

现在仪式我有以下解析配置:

        <parse>
          @type regexp
          expression ^\[(?<Level>.[^ ]*)\] (?<datetime>[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3}) (?<Type>.[^ ]*) (?<Action>.[^ ]*) \- \{"repoType":(?<repoType>.[^ ]*)\,"repo":"(?<repo>.[^ ]*)\","reqUser":"(?<reqUser>.[^ ]*)\","evtTime":"(?<evtTime>.[^ ].*)\","access":"(?<access>.[^ ]*)\","resource":"(?<resource>.[^ ].*)\","resType":"(?<resType>.[^ ]*)\","action":"(?<action>.[^ ]*)\","result":(?<result>.[^ ]*)\,"agent":"(?<agent>.[^ ].*)\","policy":(?<policy>.[^ ]*)\,"enforcer":"(?<enforcer>.[^ ]*)\","cliIP":"(?<cliIP>.[^ ]*)\","agentHost":"(?<agentHost>.[^ ]*)\","logType":"(?<logType>.[^ ]*)\","id":"(?<id>.[^ ]*)\","seq_num":(?<seq_num>.[^ ]*)\,"event_count":(?<event_count>.[^ ]*)\,"event_dur_ms":(?<event_dur_ms>.[^ ]*)\,"tags":(?<tags>.[^ ].*)\,"cluster_name":(?<cluster_name>.[^ ].*),"policy_version":(?<policy_version>.[^ ]*)\}
        </parse>

现在我们想进一步将资源字段分解为多个字段,如下所示:

AssetType
Tags
Integration
Database
Schema
Table
Column

这里的问题是资源字段总是具有以上组合是不必要的。它可以是AssetType/Tags/IntegrationAssetType/Tags/Integration/DatabaseAssetType/Tags/Integration/Database/SchemaAssetType/Tags/Integration/Database/Schema/TableAssetType/Tags/Integration/Database/Schema/Table /专栏

如果缺少任何字段,那么我们应该发送 null。

对此的任何建议或指导将不胜感激。

标签: elasticsearchfluentdapache-atlas

解决方案


您可以使用该record_reformer插件来解析资源键并为每个需要的键提取所需的值,下面是使用示例

 <match pattern.**>
    @type record_reformer
    tag new_tag.${tag_suffix[2]}
    renew_record false
    enable_ruby true
    <record>
      AssetType ${record['resource'].scan(/^([^\/]+\/){0}(?<param>[^\/]+)/).flatten.compact[0]}
      Tags ${record['resource'].scan(/^([^\/]+\/){1}(?<param>[^\/]+)/).flatten.compact[0]}
      Integration ${record['resource'].scan(/^([^\/]+\/){2}(?<param>[^\/]+)/).flatten.compact[0]}
      Database ${record['resource'].scan(/^([^\/]+\/){3}(?<param>[^\/]+)/).flatten.compact[0]}
      Schema ${record['resource'].scan(/^([^\/]+\/){4}(?<param>[^\/]+)/).flatten.compact[0]}
      Table ${record['resource'].scan(/^([^\/]+\/){5}(?<param>[^\/]+)/).flatten.compact[0]}
      Column ${record['resource'].scan(/^([^\/]+\/){6}(?<param>[^\/]+)/).flatten.compact[0]}
    </record>
  </match>

推荐阅读