首页 > 解决方案 > 弹性 - 检查给定时间范围内的所有值是否大于阈值 X

问题描述

我想使用弹性查询在 Kibana 中创建警报。我正在使用 opendistro 警报功能。我想检查最后 10 分钟内 cpu.pct 字段的所有值是否大于 50,如果是则发出警报。

{
"size": 500,
"query": {
    "bool": {
        "filter": [
            {
                "match_all": {
                    "boost": 1
                }
            },
            {
                "match_phrase": {
                    "client.id": {
                        "query": "42",
                        "slop": 0,
                        "zero_terms_query": "NONE",
                        "boost": 1
                    }
                }
            },
            {
                "range": {
                    "cpu.pct": {
                        "from": 10,
                        "to": null,
                        "include_lower": true,
                        "include_upper": true,
                        "boost": 1
                    }
                }
            },
            {
                "range": {
                    "@timestamp": {
                        "from": "{{period_end}}||-5m",
                        "to": "{{period_end}}",
                        "include_lower": true,
                        "include_upper": true,
                        "format": "epoch_millis",
                        "boost": 1
                    }
                }
            }
        ],
        "adjust_pure_negative": true,
        "boost": 1
    }
},
"aggregations": {
    "2": {
        "terms": {
            "field": "client.name.keyword",
            "size": 5,
            "min_doc_count": 1,
            "shard_min_doc_count": 0,
            "show_term_doc_count_error": false,
            "order": {
                "_key": "desc"
            }
        },
        "aggregations": {
            "3": {
                "terms": {
                    "field": "component.name",
                    "size": 1000,
                    "min_doc_count": 1,
                    "shard_min_doc_count": 0,
                    "show_term_doc_count_error": false,
                    "order": [
                        {
                            "1": "desc"
                        },
                        {
                            "_key": "asc"
                        }
                    ]
                },
                "aggregations": {
                    "1": {
                        "avg": {
                            "field": "cpu.pct"
                        }
                    }
                }
            }
        }
    }
}

我有以下计算平均值的查询,但这是不正确的。

负例:值 (100, 100, 100, 100, 100, 100, 0, 0, 0, 0) | 发出警报:否(平均:60)

正例:值 (60, 60, 60, 60, 60, 60, 60, 60, 60, 60) | 发出警报:是(平均:60)

如何检查所有值?

标签: elasticsearchkibanaelasticsearch-opendistro

解决方案


我不确定您使用什么应用程序来触发警报。解决您的情况的一种方法是使用两个过滤器聚合:

  1. totalInLast10Min:这是为了获取最近 10 分钟内被索引的文档总数。
  2. totalInLast10MinAboveTh:这是为了获取最近 10 分钟内索引的总文档,并且字段的值高于阈值。

如果totalInLast10Min == totalInLast10MinAboveTh然后触发警报。

例如。

创建索引

PUT test
{
  "mappings": {
    "properties": {
      "timestamp": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss"
      }
    }
  }
}

添加一些文档

POST test/_doc
{"cpu":20,"timestamp":"2020-08-18 20:20:00"}

POST test/_doc
{"cpu":100,"timestamp":"2020-08-18 20:21:00"}

POST test/_doc
{"cpu":90,"timestamp":"2020-08-18 20:29:00"}

询问:

GET test/_search
{
  "size": 0,
  "aggs": {
    "totalInLast10Min": {
      "filter": {
        "range": {
          "timestamp": {
            "gte": "2020-08-18 20:20:00"
          }
        }
      }
    },
    "totalInLast10MinAboveTh": {
      "filter": {
        "bool": {
          "must": [
            {
              "range": {
                "timestamp": {
                  "gte": "2020-08-18 20:20:00"
                }
              }
            },
            {
              "range": {
                "cpu": {
                  "gte": 80
                }
              }
            }
          ]
        }
      }
    }
  }
}

样本结果:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "totalInLast10MinAboveTh" : {
      "meta" : { },
      "doc_count" : 2
    },
    "totalInLast10Min" : {
      "meta" : { },
      "doc_count" : 3
    }
  }
}

根据两个 aggs 的计数,您可以编写关于何时触发警报的条件。


推荐阅读