首页 > 解决方案 > 每个 @timestamp 小时的 ElasticSearch 查询聚合

问题描述

我在 metricbeat 的 elasticSearch 上进行查询,以每小时评估最常用的进程,在这些时刻,我正在汇总每个进程的开始时间和进程名称,我需要每小时使用字段“@timestamp”“划分”这些组

这是我的实际查询

GET metricbeat*/_search?
{"query": {
          "bool": {
                "must": [
                    { "wildcard" : { "beat.hostname" : "ibmcx*" }},
                    { "range": {
                      "@timestamp": {
                        "gte": "2019-03-22T00:00:00",
                        "lte": "2019-03-23T00:00:00"}}},
                    {"terms" : { "beat.hostname" : ["ibmcxapp101", "ibmcxapp102", "ibmcxapp103",
                                    "ibmcxapp104", "ibmcxapp105", "ibmcxapp106", "ibmcxapp107",
                                    "ibmcxapp108", "ibmcxapp109", "ibmcxapp110", "ibmcxapp111",
                                    "ibmcxapp112", "ibmcxapp113", "ibmcxapp114", "ibmcxapp115",
                                    "ibmcxapp116", "ibmcxapp117", "ibmcxapp118", "ibmcxapp119",
                                    "ibmcxapp120", "ibmcxapp121", "ibmcxapp122", "ibmcxxaa100",
                                    "ibmcxxaa101", "ibmcxxaa102", "ibmcxxaa103", "ibmcxxaa104",
                                    "ibmcxxaa105", "ibmcxxaa106", "ibmcxxaa107", "ibmcxxaa108",
                                    "ibmcxxaa109", "ibmcxxaa110", "ibmcxxaa111", "ibmcxxaa112",
                                    "ibmcxxaa201", "ibmcxxaa202", "ibmcxxaa203", "ibmcxxaa204"
                                    ] }},
                    {"exists": {"field": "system.process.cmdline"}}
                ],
                "must_not": [
                   {"term" : { "system.process.username" : "NT AUTHORITY\\SYSTEM" }},
                   {"term" : { "system.process.username" : "NT AUTHORITY\\NETWORK SERVICE" }},
                   {"term" : { "system.process.username" : "NT AUTHORITY\\LOCAL SERVICE" }},
                   {"term" : { "system.process.username" : "NT AUTHORITY\\Servicio de red"}},
                   {"term" : { "system.process.username" : "" }}
                  ]
          }
        },
        "size": 0,
        "aggs": {
          "group_by_start_time": {
            "terms": {
              "field": "system.process.cpu.start_time"
            },
            "aggs": {
              "group_by_name": {
                "terms": {
                  "field": "system.process.name.keyword"
                }
              }
            }
          }
        },
        "size": 0,
        "sort" : [
            { "system.process.cpu.start_time" : {"order" : "asc"}},
            { "@timestamp" : {"order" : "asc"}},
            { "system.process.pid" : {"order" : "desc"}}
        ]}

标签: elasticsearch

解决方案


这有点难以遵循和重现——一个最小的例子(我认为整个query不是真的需要)和示例文档会有很长的路要走。

如果您想进行每小时聚合,您需要做的第一件事就是聚合,然后在其中运行其他聚合。

每小时聚合的最小示例是:

POST /metricbeat*/_search?size=0
{
    "aggs" : {
        "metrics_per_hour" : {
            "date_histogram" : {
                "field" : "@timestamp",
                "interval" : "hour"
            }
        }
    }
}

在另一个聚合中折叠如下所示:

POST /metricbeat*/_search?size=0
{
    "aggs" : {
        "metrics_per_hour" : {
            "date_histogram" : {
                "field" : "@timestamp",
                "interval" : "hour"
            },
            "aggs" : {
                ...
            }
        }
    }
}

PS:如果您使用的是每日索引模式,您可以只使用正确的日期而不是通配符,然后跳过这部分查询:

"range": {
    "@timestamp": {
        "gte": "2019-03-22T00:00:00",
        "lte": "2019-03-23T00:00:00"
    }
}

推荐阅读