logstash - 如何通过来自csv的logstash过滤器变异添加字典数组?
问题描述
我已经编写了logstash配置文件来上传csv,csv有多个申请人信息,我需要在kibana索引中作为字典数组上传,而不是作为带有索引的dict字典。
filter {
csv {
separator => ","
skip_header => true
columns => [LoanID,Applicant_Income1,Occupation1,Time_At_Work1,Date_Of_Join1,Gender,LoanAmount,Marital_Status,Dependents,Education,Self_Employed,Applicant_Income2,Occupation2,Time_At_Work2,Date_Of_Join2,Applicant_Income3,Occupation3,Time_At_Work3,Date_Of_Join3]
}
mutate {
convert => {
"Applicant_Income1" => "float"
"Time_At_Work1" => "float"
"LoanAmount" => "float"
"Applicant_Income2" => "float"
"Time_At_Work2" => "float"
"Applicant_Income3" => "float"
"Time_At_Work3" => "float"
}
}
mutate{
rename => {
"Applicant_Income1" => "[Applicant][0][Applicant_Income]"
"Occupation1" => "[Applicant][0][Occupation]"
"Time_At_Work1" => "[Applicant][0][Time_At_Work]"
"Date_Of_Join1" => "[Applicant][0][Date_Of_Join]"
"Applicant_Income2" => "[Applicant][1][Applicant_Income]"
"Occupation2" => "[Applicant][1][Occupation]"
"Time_At_Work2" => "[Applicant][1][Time_At_Work]"
"Date_Of_Join2" => "[Applicant][1][Date_Of_Join]"
"Applicant_Income3" => "[Applicant][2][Applicant_Income]"
"Occupation3" => "[Applicant][2][Occupation]"
"Time_At_Work3" => "[Applicant][2][Time_At_Work]"
"Date_Of_Join3" => "[Applicant][2][Date_Of_Join]"
}
}
date {
match => [ "Date_Of_Join1", "yyyy-MM-dd'T'HH:mm:ss.SSZZ" ]
}
date {
match => [ "Date_Of_Join2", "yyyy-MM-dd'T'HH:mm:ss.SSZZ" ]
}
date {
match => [ "Date_Of_Join3", "yyyy-MM-dd'T'HH:mm:ss.SSZZ" ]
}
}
我的申请人字段为
但是我需要将申请人字段作为字典数组,例如
我尝试了 add_field,但没有工作
mutate{
add_field => { "[Applicant][Applicant_Income1]" => "Applicant_Income1",
"[Applicant][Occupation1]" => "Occupation1",
"[Applicant][Time_At_Work1]" => "Time_At_Work1",
"[Applicant][Date_Of_Join1]" => "Date_Of_Join1"
}
}
解决方案
Logstash 过滤器中的方括号的行为与其他编程语言(例如 Java)中的数组元素/条目不同。
[Applicant][0][Applicant_Income]
不是设置Applicant_Income
申请人数组中第一个元素(从零开始的索引)的字段值的正确语法。相反,您在申请人元素下创建子元素 0、1、2,如图 1 所示。
要创建对象数组,您应该使用 ruby 过滤器插件 ( https://www.elastic.co/guide/en/logstash/current/plugins-filters-ruby.html )。由于您可以使用该过滤器执行任意 ruby 代码,因此它为您提供了更多控制/自由:
filter {
csv {
separator => ","
skip_header => true
columns => [LoanID,Applicant_Income1,Occupation1,Time_At_Work1,Date_Of_Join1,Gender,LoanAmount,Marital_Status,Dependents,Education,Self_Employed,Applicant_Income2,Occupation2,Time_At_Work2,Date_Of_Join2,Applicant_Income3,Occupation3,Time_At_Work3,Date_Of_Join3]
}
mutate {
convert => {
"Applicant_Income1" => "float"
"Time_At_Work1" => "float"
"LoanAmount" => "float"
"Applicant_Income2" => "float"
"Time_At_Work2" => "float"
"Applicant_Income3" => "float"
"Time_At_Work3" => "float"
}
}
ruby{
code => '
event.set("Applicant",
[
{
"Applicant_Income" => event.get("Applicant_Income1"),
"Occupation" => event.get("Occupation1"),
"Time_At_Work" => event.get("Time_At_Work1"),
"Date_Of_Join" => event.get("Date_Of_Join1")
},
{
# next object...
}
]
'
}
date {
match => [ "Date_Of_Join1", "yyyy-MM-dd'T'HH:mm:ss.SSZZ" ]
}
date {
match => [ "Date_Of_Join2", "yyyy-MM-dd'T'HH:mm:ss.SSZZ" ]
}
date {
match => [ "Date_Of_Join3", "yyyy-MM-dd'T'HH:mm:ss.SSZZ" ]
}
mutate{
remove_field => [
"Applicant_Income1",
"Occupation1",
"Time_At_Work1",
"Date_Of_Join1",
"Applicant_Income2",
"Occupation2",
"Time_At_Work2",
"Date_Of_Join2",
"Applicant_Income3",
"Occupation3",
"Time_At_Work3",
"Date_Of_Join3"
]
}
}
随着event.set
您向文档中添加一个字段。第一个参数是字段名,第二个参数是它的值。在这种情况下,您将字段“申请人”添加到文档中,并将对象数组作为其值。
event.get
用于获取文档中某个字段的值。您通过将字段名传递给方法来检索值。
请参阅本指南 https://www.elastic.co/guide/en/logstash/current/event-api.html以获取有关事件 API 的更多信息。
我希望我能帮助你。
推荐阅读
- delphi - 可以更改 Delphi 搜索历史列表
- python - 如何报告来自重复字符的所有命中?
- java - 不使用除法、乘法和 mod 运算符将两个整数相除,位操作循环中断方程中的缺陷
- javascript - JS/jQ 制作一个不重复的 URL builder 函数
- laravel - Laravel 5:查询生成器:使用日期时间数据类型中的日期获取记录
- python - 阻止用户输入与先前值匹配的输入
- javascript - 如何将远程图像读取到 base64 数据 url
- javascript - 单击汉堡图标后,菜单不会显示在主页上的幻灯片图像上方
- python - 如何连接 2 keras.modeloutput?
- ios - 使用服务器到服务器通知和 Firebase Cloud Functions 对 iOS 和 Android 自动更新订阅进行服务器端验证