elasticsearch - 如何在 grok regex fluentd 中定义一个字段
问题描述
我有以下 apache atlas 审计日志:
[INFO] 2020-06-29 15:14:31,732 AUDIT logJSON - {"repoType":15,"repo":"atlas","reqUser":"varun","evtTime":"2020-06-29 15:14:29.967","access":"entity-read","resource":"AtlanColumn/[]/glue/78975568964/flights/default/flightsgdelt_100m_test_partition/c_11","resType":"entity","action":"entity-read","result":1,"agent":"atlas","policy":6,"enforcer":"ranger-acl","cliIP":"10.9.2.76","agentHost":"atlas-7d9dcdd6c5-lmfzj","logType":"RangerAudit","id":"87c9e862-910b-4ee2-86f8-cb174f4e7b76-863129","seq_num":1701441,"event_count":1,"event_dur_ms":0,"tags":[],"cluster_name":"","policy_version":54}
现在仪式我有以下解析配置:
<parse>
@type regexp
expression ^\[(?<Level>.[^ ]*)\] (?<datetime>[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3}) (?<Type>.[^ ]*) (?<Action>.[^ ]*) \- \{"repoType":(?<repoType>.[^ ]*)\,"repo":"(?<repo>.[^ ]*)\","reqUser":"(?<reqUser>.[^ ]*)\","evtTime":"(?<evtTime>.[^ ].*)\","access":"(?<access>.[^ ]*)\","resource":"(?<resource>.[^ ].*)\","resType":"(?<resType>.[^ ]*)\","action":"(?<action>.[^ ]*)\","result":(?<result>.[^ ]*)\,"agent":"(?<agent>.[^ ].*)\","policy":(?<policy>.[^ ]*)\,"enforcer":"(?<enforcer>.[^ ]*)\","cliIP":"(?<cliIP>.[^ ]*)\","agentHost":"(?<agentHost>.[^ ]*)\","logType":"(?<logType>.[^ ]*)\","id":"(?<id>.[^ ]*)\","seq_num":(?<seq_num>.[^ ]*)\,"event_count":(?<event_count>.[^ ]*)\,"event_dur_ms":(?<event_dur_ms>.[^ ]*)\,"tags":(?<tags>.[^ ].*)\,"cluster_name":(?<cluster_name>.[^ ].*),"policy_version":(?<policy_version>.[^ ]*)\}
</parse>
现在我们想进一步将资源字段分解为多个字段,如下所示:
AssetType
Tags
Integration
Database
Schema
Table
Column
这里的问题是资源字段总是具有以上组合是不必要的。它可以是AssetType/Tags/Integration或AssetType/Tags/Integration/Database或AssetType/Tags/Integration/Database/Schema或AssetType/Tags/Integration/Database/Schema/Table或AssetType/Tags/Integration/Database/Schema/Table /专栏。
如果缺少任何字段,那么我们应该发送 null。
对此的任何建议或指导将不胜感激。
解决方案
您可以使用该record_reformer
插件来解析资源键并为每个需要的键提取所需的值,下面是使用示例
<match pattern.**>
@type record_reformer
tag new_tag.${tag_suffix[2]}
renew_record false
enable_ruby true
<record>
AssetType ${record['resource'].scan(/^([^\/]+\/){0}(?<param>[^\/]+)/).flatten.compact[0]}
Tags ${record['resource'].scan(/^([^\/]+\/){1}(?<param>[^\/]+)/).flatten.compact[0]}
Integration ${record['resource'].scan(/^([^\/]+\/){2}(?<param>[^\/]+)/).flatten.compact[0]}
Database ${record['resource'].scan(/^([^\/]+\/){3}(?<param>[^\/]+)/).flatten.compact[0]}
Schema ${record['resource'].scan(/^([^\/]+\/){4}(?<param>[^\/]+)/).flatten.compact[0]}
Table ${record['resource'].scan(/^([^\/]+\/){5}(?<param>[^\/]+)/).flatten.compact[0]}
Column ${record['resource'].scan(/^([^\/]+\/){6}(?<param>[^\/]+)/).flatten.compact[0]}
</record>
</match>
推荐阅读
- python - 生成数组的所有可能子集会返回一个空列表列表
- javascript - 使用输入收音机显示/隐藏图像
- wordpress - 子域管理员和网络重定向循环上的多站点 404
- html - 如何强制引导 4 列连续调整每个内部内容的高度?
- python - 如何从文件或列表中读取?
- php - Braintree dropin UI - 需要将访问令牌传递给 Braintree\Gateway
- python-3.x - 如何遍历 API 调用 Python 3 中的两个列表
- sql - Oracle CLOB REGEXP_REPLACE 建议
- python - 尝试运行转换为 .exe 文件的 .py 程序后出现病毒警告
- android - Android Firebase 数据库 limitolast 10 但加起来