首页 > 解决方案 > 使用 Logstash 和 Xpath 提取 XML 数据

问题描述

我有这个示例 XML 数据:

<root>
    <actors>
        <actor id="1" name="Christian Bale"></actor>
        <actor id="2" name="Liam Neeson"></actor>
        <actor id="3" name="Michael Caine"></actor>
    </actors>   
</root>

这是我提取数据的logstash配置

input
{
    file
        {
            path => "D:/data.xml"
            start_position => "beginning"
            sincedb_path => "NUL"
            exclude => "*.gz"
            type => "xml"
            codec => multiline {
                    pattern => "<?xml " 
                    negate => "true"
                    what => "previous"
                }
        }
}

filter {

    xml{
        source => "message"
        store_xml => true target => "id"
        target => "root"
        xpath => [
            "/root/actors/actor/text()", "actor"            
        ]
    }    
}

output{

elasticsearch{
        hosts => ["http://localhost:9200/"]
        index => "actor"
    }

    stdout
    {
        codec => rubydebug
    }
}

当我运行这个配置时,我得到的是下面的打印屏幕

在此处输入图像描述

但是我需要的是actor索引,该索引由基于actorwhich isid和创建的列定义name

这是我运行配置时的日志:

“使用捆绑的 JDK:”“OpenJDK 64 位服务器 VM 警告:选项 UseConcMarkSweepGC 在 9.0 版中已弃用,可能会在未来的版本中删除。警告:发生了非法反射访问操作警告:org.jruby 进行的非法反射访问.ext.openssl.SecurityHelper (file:/C:/Users/CHEEWE~1.NGA/AppData/Local/Temp/jruby-11656/jruby15503754749915308062jopenssl.jar) 到字段 java.security.MessageDigest.provider 警告:请考虑报告这个给 org.jruby.ext.openssl.SecurityHelper 的维护者 警告:使用 --illegal-access=warn 启用进一步非法反射访问操作的警告 警告:所有非法访问操作将在未来版本中被拒绝 将 Logstash 日志发送到 D :/logstash/logs 现在通过 log4j2.properties [2020-12-07T17:54:43,527][INFO][logstash.runner] 启动 Logstash {"logstash.version"=>"7.10.0", "jruby.version"=>"jruby 9.2.13.0 (2.5.7) 2020-08-03 9a89c94bcc OpenJDK 64 位服务器 VM 11.0.8+10 on 11.0.8+10 +indy +jit [mswin32-x86_64]"} [2020-12-07T17:54:43,843][WARN][logstash.config.source.multilocal]忽略“pipelines.yml”文件,因为指定了模块或命令行选项 [2020-12-07T17:54:45,899][INFO][org.reflections.Reflections] 反射花了 43 毫秒扫描 1 个 url,产生 23 个键和47 个值 [2020-12-07T17:54:47,229][INFO][logstash.outputs.elasticsearch][main] Elasticsearch 池 URL 更新 {:changes=>{:removed=>[], :add=>[http: //localhost:9200/]}} [2020-12-07T17:54:47,482][WARN][logstash.outputs.elasticsearch][main] 恢复到 ES 实例的连接 {:url=>"http://localhost:9200/"} [2020-12-07T17:54:47,544][INFO][logstash.outputs.elasticsearch][main] ES 输出版本确定 {:es_version=>7} [ 2020-12-07T17:54:47,551][WARN][logstash.outputs.elasticsearch][main] 检测到 6.x 及更高版本的集群:type事件字段不会用于确定文档 _type {:es_version=>7} [2020-12-07T17:54:47,618][INFO ][logstash.outputs.elasticsearch][main] New Elasticsearch output {:class= >"LogStash::Outputs::ElasticSearch", :hosts=>["//localhost:9200"]} [2020-12-07T17:54:47,689][INFO ][logstash.outputs.elasticsearch][main] 使用默认映射模板 {:es_version=>7, :ecs_compatibility=>:disabled} [2020-12-07T17:54:47,786][INFO][logstash.outputs.elasticsearch][main] 正在尝试安装模板 {:manage_template= >{"index_patterns"=>"logstash-", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s", "number_of_shards"=>1, "index.lifecycle.name"=>"logstash-policy", "index .lifecycle.rollover_alias"=>"logstash"}, "mappings"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "映射"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=> "keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type" =>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type "=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}} [2020-12- 07T17:54:47,846][信息][logstash。output.elasticsearch][main] 创建翻转别名 <logstash-{now/d}-000001> [2020-12-07T17:54:47,964][INFO][logstash.javapipeline][main] 启动管道 {:pipeline_id=> "main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>1000, "pipeline.sources"= >["D:/logstash/bin/logstash-simple.conf"], :thread=>"#<Thread:0x78c4a90f run>"} [2020-12-07T17:54:49,256][INFO][logstash.javapipeline ][main] 管道 Java 执行初始化时间 {"seconds"=>1.29} [2020-12-07T17:54:49,347][INFO ][logstash.javapipeline ][main] 管道启动 {"pipeline.id"=>" main"} 标准输入插件现在正在等待输入:[2020-12-07T17:54:49,446][INFO][logstash.agent] 管道运行 {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]} [2020-12-07T17:54:49,757][INFO][ logstash.agent ] 成功启动 Logstash API 端点 {:port=>9600}

标签: logstash

解决方案


如果 elasticsearch 和 logstash 都在运行最新版本,则默认启用 ILM。在这种情况下,索引选项的值将被忽略,默认索引名称为 logstash-{now/d}-00001。如果要使用 index 选项设置索引名称,请将 ilm_enabled 选项设置为 false。


推荐阅读