elasticsearch - 如何删除具有空值的重复文档和字段
问题描述
csv 文件中存在具有空值的重复行。我想删除这些记录中的空字段并用其他记录覆盖它们。我设法删除了空值的字段。但是,我不能那样做。请帮我!
My csv file =>
name,surname,age,email,phone
Busra,Duygu,99,,05555555555
Busra,Duygu,,busraduygu@gmail.com,
Busra,Duygu,99,,
Busra,Duygu,,,
这意味着,在我的 csv 文件中,同一个人多次重复信息,并且某些记录具有空值。我想得到的输出:
Büşra Duygu,99,busraduygu@gmail.com,05555555555
为了实现这些,我首先将 csv 文件添加到 null_problem 索引中,然后我创建了一个名为 null_problem_finger 的索引来用指纹方法组织这些重复的文档,但我没有成功。
null_problem 索引=>
input{
file {
path => ".../null_problem.csv"
start_position => "beginning"
sincedb_path => "NUL"
}
}
filter{
csv{
autodetect_column_names => "true"
separator => ","
skip_header => "true"
columns => ["name","surname","age","email","phone"]
}
mutate {
remove_field =>["path", "host", "message", "@version", "@timestamp", "trade_date"]
}
ruby {
code => "
def walk_hash(parent, path, hash)
path << parent if parent
hash.each do |key, value|
walk_hash(key, path, value) if value.is_a?(Hash)
@paths << (path + [key]).map {|p| '[' + p + ']' }.join('')
end
path.pop
end
@paths = []
walk_hash(nil, [], event.to_hash)
@paths.each do |path|
value = event.get(path)
event.remove(path) if value.nil? || (value.respond_to?(:empty?) && value.empty?)
end
"
}
}
output{
elasticsearch {
hosts => "http://localhost:9200"
index => "null_problem"
document_type => "_doc"
}
stdout {}
}
null_problem_fingerprint 索引 =>
input {
elasticsearch {
hosts => "localhost"
index => "null_problem"
query => '{ "sort": [ "_doc" ] }'
}
}
filter{
fingerprint {
method => "SHA1"
source => ["name","surname","age","email","phone"]
target => "[@metadata][generated_id]"
concatenate_sources => "true"
}
mutate {
remove_field =>["path", "host", "message", "@version", "@timestamp", "trade_date"]
}
}
output {
stdout { codec => dots }
elasticsearch {
index => "null_problem_fingerprint"
document_id => "%{[@metadata][generated_id]}"
doc_as_upsert => "true"
action => "update"
}
}
我用ruby中的代码博客删除了具有空值的字段,但是在制作指纹后,我仍然无法达到所需的输出。请帮我!
解决方案
推荐阅读
- linux - 防火墙设置导致 523 错误 [Linux]
- python - 在 openpyxl 中,有没有办法在不覆盖现有格式的情况下应用格式?
- c# - 将服务注入 Nhibernare Envers 以自定义修订实体
- rest - 尝试 POST 时连接关闭
- reactjs - 材质 UI 选择不显示选项
- python - 设置没有选择的 ipywidget 选项
- azure - 如何在 Azure DevOps 中创建集成测试?
- flutter - 将 Flutter 列表导出为 CSV 文件
- node.js - 使用来自另一个集合的“内部选择”的数据将文档插入集合
- sql - 计算未结工单的通过服务时间 (Oracle SQL)