首页 > 解决方案 > 如何通过来自csv的logstash过滤器变异添加字典数组?

问题描述

我已经编写了logstash配置文件来上传csv,csv有多个申请人信息,我需要在kibana索引中作为字典数组上传,而不是作为带有索引的dict字典。

filter {
    csv {
        separator => ","
        skip_header => true
        columns => [LoanID,Applicant_Income1,Occupation1,Time_At_Work1,Date_Of_Join1,Gender,LoanAmount,Marital_Status,Dependents,Education,Self_Employed,Applicant_Income2,Occupation2,Time_At_Work2,Date_Of_Join2,Applicant_Income3,Occupation3,Time_At_Work3,Date_Of_Join3]
    }
    mutate { 
        convert => {
            "Applicant_Income1" => "float"
            "Time_At_Work1" => "float"
            "LoanAmount" => "float"
            "Applicant_Income2" => "float"
            "Time_At_Work2" => "float"
            "Applicant_Income3" => "float"
            "Time_At_Work3" => "float"
            }
        } 
    mutate{
        rename => {
            "Applicant_Income1" => "[Applicant][0][Applicant_Income]"
            "Occupation1" => "[Applicant][0][Occupation]"
            "Time_At_Work1" => "[Applicant][0][Time_At_Work]"
            "Date_Of_Join1" => "[Applicant][0][Date_Of_Join]"
            "Applicant_Income2" => "[Applicant][1][Applicant_Income]"
            "Occupation2" => "[Applicant][1][Occupation]"
            "Time_At_Work2" => "[Applicant][1][Time_At_Work]"
            "Date_Of_Join2" => "[Applicant][1][Date_Of_Join]"
            "Applicant_Income3" => "[Applicant][2][Applicant_Income]"
            "Occupation3" => "[Applicant][2][Occupation]"
            "Time_At_Work3" => "[Applicant][2][Time_At_Work]"
            "Date_Of_Join3" => "[Applicant][2][Date_Of_Join]"
            }
        }   
    date {
        match => [ "Date_Of_Join1", "yyyy-MM-dd'T'HH:mm:ss.SSZZ" ]
        }   
    date {
        match => [ "Date_Of_Join2", "yyyy-MM-dd'T'HH:mm:ss.SSZZ" ]
      } 
    date {
        match => [ "Date_Of_Join3", "yyyy-MM-dd'T'HH:mm:ss.SSZZ" ]      
      }       
    }

我的申请人字段为

在此处输入图像描述

但是我需要将申请人字段作为字典数组,例如

在此处输入图像描述

我尝试了 add_field,但没有工作

    mutate{
        add_field => {  "[Applicant][Applicant_Income1]" => "Applicant_Income1",
                    "[Applicant][Occupation1]" => "Occupation1",
                "[Applicant][Time_At_Work1]" => "Time_At_Work1",
                "[Applicant][Date_Of_Join1]" => "Date_Of_Join1"
                        }
        }

标签: logstashlogstash-configuration

解决方案


Logstash 过滤器中的方括号的行为与其他编程语言(例如 Java)中的数组元素/条目不同。

[Applicant][0][Applicant_Income]

不是设置Applicant_Income申请人数组中第一个元素(从零开始的索引)的字段值的正确语法。相反,您在申请人元素下创建子元素 0、1、2,如图 1 所示。

要创建对象数组,您应该使用 ruby​​ 过滤器插件 ( https://www.elastic.co/guide/en/logstash/current/plugins-filters-ruby.html )。由于您可以使用该过滤器执行任意 ruby​​ 代码,因此它为您提供了更多控制/自由:

filter {
  csv {
    separator => ","
    skip_header => true
    columns => [LoanID,Applicant_Income1,Occupation1,Time_At_Work1,Date_Of_Join1,Gender,LoanAmount,Marital_Status,Dependents,Education,Self_Employed,Applicant_Income2,Occupation2,Time_At_Work2,Date_Of_Join2,Applicant_Income3,Occupation3,Time_At_Work3,Date_Of_Join3]
  }

  mutate { 
    convert => {
      "Applicant_Income1" => "float"
      "Time_At_Work1" => "float"
      "LoanAmount" => "float"
      "Applicant_Income2" => "float"
      "Time_At_Work2" => "float"
      "Applicant_Income3" => "float"
      "Time_At_Work3" => "float"
    }
  } 

  ruby{
    code => '
      event.set("Applicant", 
       [
        {
         "Applicant_Income" => event.get("Applicant_Income1"),
         "Occupation" => event.get("Occupation1"), 
         "Time_At_Work" => event.get("Time_At_Work1"),
         "Date_Of_Join" => event.get("Date_Of_Join1")
        },
        {
           # next object...
        }
       ]
    '
  }

  date {
    match => [ "Date_Of_Join1", "yyyy-MM-dd'T'HH:mm:ss.SSZZ" ]
  } 

  date {
    match => [ "Date_Of_Join2", "yyyy-MM-dd'T'HH:mm:ss.SSZZ" ]
  } 

  date {
    match => [ "Date_Of_Join3", "yyyy-MM-dd'T'HH:mm:ss.SSZZ" ] 
  } 

  mutate{
    remove_field => [
      "Applicant_Income1",
      "Occupation1",
      "Time_At_Work1",
      "Date_Of_Join1",
      "Applicant_Income2",
      "Occupation2",
      "Time_At_Work2",
      "Date_Of_Join2",
      "Applicant_Income3",
      "Occupation3",
      "Time_At_Work3",
      "Date_Of_Join3"
    ]
  } 
}

随着event.set您向文档中添加一个字段。第一个参数是字段名,第二个参数是它的值。在这种情况下,您将字段“申请人”添加到文档中,并将对象数组作为其值。

event.get用于获取文档中某个字段的值。您通过将字段名传递给方法来检索值。

请参阅本指南 https://www.elastic.co/guide/en/logstash/current/event-api.html以获取有关事件 API 的更多信息。

我希望我能帮助你。


推荐阅读