首页 > 技术文章 > Elasticsearch 聚合Aggregations API

ruhuanxingyun 2020-02-13 18:01 原文

简介:聚合框架有助于根据搜索查询提供聚合数据,语法定义如下:

"aggregations" : {                                      // 可以简写为aggs
    "<aggregation_name>" : {                            // 聚合名字,唯一标识符
        "<aggregation_type>" : {                        // 聚合类型
            <aggregation_body>                          // 聚合体,对那些字段聚合
        }
        [,"meta" : {  [<meta_data_body>] } ]?           // 元
        [,"aggregations" : { [<sub_aggregation>]+ } ]?  // 聚合里面的子聚合
    }
    [,"<aggregation_name_2>" : { ... } ]*               // 另一个聚合名字
}

一、Metric Aggregations(指标聚合):对桶内的文档进行统计计算

  1. Top Hits:获取文档前几条数据,相当于MySQL中limit

    A. URL:POST /index/_search?size=0

    B. 请求参数

      form:开始位置;

      size:返回匹配项的最大数量,默认值3;

      sort:匹配项的排序方式,默认是按照分数排序。

    C. Kibana查询

    D. Java实现

TopHitsAggregationBuilder aggregationBuilder = AggregationBuilders.topHits("top_hits").sort("time", SortOrder.DESC).size(1);

SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是避免索引不存在的问题
if (aggregations != null) {
  TopHits topHits = aggregations.get("top_hits");
}

   2. Cardinality:统计去重后的文档数,相当于MySQL中count(distinct(字段))

    A. URL:POST /index/_search?size=0

    B. 请求参数

      field:去重字段名;

      script:脚本。

    C. Kibana查询

    D. Java实现

CardinalityAggregationBuilder aggregationBuilder = AggregationBuilders.cardinality("cardinality").field("cid");

SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是避免索引不存在的问题
if (aggregations != null) {
  Cardinality cardinality = aggregations.get("cardinality");
  long count = cardinality.getValue();
}

  3. Max:对指定字段求最大值

    A. URL:POST /index/_search?size=0

    B. 请求参数

      field:求最大值字段名;

      script:脚本。

    C. Kibana查询

    D. Java实现

MaxAggregationBuilder aggregationBuilder = AggregationBuilders.max("max").field("timestamp");

SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是避免索引不存在的问题
if (aggregations != null) {
  ParsedMax max = aggregations.get("max");
  String timestamp = max.getValueAsString(); }

  4. Min:对指定字段求最小值

    A. URL:POST /index/_search?size=0

    B. 请求参数

      filed:求最小值字段名;

      script:脚本。

    C. Kibana查询

    D. Java实现

MinAggregationBuilder aggregationBuilder = AggregationBuilders.min("min").field("timestamp");
            
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是避免索引不存在的问题
if (aggregations != null) {
  ParsedMin min = aggregations.get("min");
  String timestamp = min.getValueAsString(); }

  5. Sum:对指定字段值求和

    A. URL:POST /index/_search?size=0

    B. 请求参数

      filed:求和字段名;

      script:脚本。

    C. Kibana查询

    D. Java实现

SumAggregationBuilder aggregationBuilder = AggregationBuilders.sum("sum").field("low");
            
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是避免索引不存在的问题
if (aggregations != null) {
  Sum sum = aggregations.get("low");
  Double low = sum.getValue(); }

  6. Avg:求均值 

    A. URL:

    B. 请求参数

      script:脚本

    C. Kibana查询

    D. Java实现

  7. Stats:统计,包含Max、Min、Sum、Avg

    A. URL:

    B. 请求参数

      script:脚本

    C. Kibana查询

    D. Java实现

  8. Value Count:统计文档数,重复的依然会计数  

    A. URL:POST /index/_search?size=0

    B. 请求参数

      field:统计的字段名;

      script:脚本。

    C. Kibana查询

    D. Java实现

ValueCountAggregationBuilder aggregationBuilder = AggregationBuilders.count("count").field("cid");

SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是避免索引不存在的问题
if (aggregations != null) {
  ValueCount valueCount = aggregations.get("count");
  long count = valueCount.getValue();
}

 

二、Bucket Aggregations(桶聚合):满足特定条件的文档的集合

  1. Terms:对指定字段进行分组统计,相当于MySQL中group by,该聚合不太准确

    A. URL:GET /index/_search

    B. 请求参数

      filed:分组对象名,只适合一个字段;

      size:返回文档的个数,默认值10,size值越大,数据越准确,伴随成本也越高;

      order:指定返回结果的排序方式;

      script:脚本,仅限于根据两个字段进行分组,但这有性能问题,最好不用。

    C. Kibana查询

    D. Java实现

 // Script script = new Script("doc['data.srcip'].value + '_' + doc['data.dstip'].value");
 // TermsAggregationBuilder aggregationBuilder = AggregationBuilders.terms("terms").script(script).size(Integer.MAX_VALUE);

TermsAggregationBuilder aggregationBuilder = AggregationBuilders.terms("terms").field("data.ip").size(Integer.MAX_VALUE);
        
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是避免索引不存在的问题
if (aggregations != null) {
  Terms terms = aggregations.get("terms");
}

   2. Filter:对查询的文档再进行过滤

    A. URL:POST /index/_search?size=0

    B. 请求参数:可参考DSL语句查询

    C. Kibana查询

    D. Java实现

FilterAggregationBuilder aggregationBuilder = AggregationBuilders.filter("filter", QueryBuilders.termsQuery("rule", new String[]{"login", "auth", "cca"}));
        
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是解决索引不存在的问题
if (aggregations != null) {
  Filter filter = aggregations.get("filter");
}

  3. Range:按指定区间范围统计,注意包括from值,不包括to值

    A. URL:GET /index/_search

    B. 请求参数

      field:区间字段名;

      to value1:指从*到value1范围,不包括value1;

      from value1 - to value2:指从value1 到value2范围,包括value1,但不包括value2;

      from value2:指从value2到*范围,包括value2。

    C. Kibana查询

    D. Java实现

RangeAggregationBuilder aggregationBuilder = AggregationBuilders.range("range").field("level").addUnboundedTo("1", 6).addRange("2", 6, 11).addUnboundedFrom("3", 11);

SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是解决索引不存在的问题
if (aggregations != null) {
  Range range = aggregations.get("range");
}

   4. Date histogram:按日期统计日期直方图数据,适用于日期和日期范围聚合

    A. URL:POST /index/_search?size=0

    B. 请求参数

      field:日期字段名;

      format:时间格式;

      calendar_interval:日历间隔,比如2d;

      fixed_interval:固定间隔,比如1000ms;

      min_doc_count:最小文档数,比该值还小就忽略获取。

    C. Kibana查询

    D. Java实现

DateHistogramAggregationBuilder aggregationBuilder = AggregationBuilders.dateHistogram("date_histogram")
  .field("timestamp")
  .format("yyyy-MM-dd")
  .calendarInterval(new DateHistogramInterval("1d"))
  .minDocCount(1);

SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
if (aggregations != null) {
   ParsedDateHistogram histogram = aggregations.get("date_histogram");
}

  5. Date range:按日期值的区间范围统计

    A. URL:POST /index/_search?size=0

    B. 请求参数

      field:日期区间字段名;

      format:时间格式; 

      to value1:指从*到value1范围,不包括value1;

      from value1 - to value2:指从value1 到value2范围,包括value1,但不包括value2;

    C. Kibana查询

    D. Java实现

DateRangeAggregationBuilder dateRangeAggregationBuilder = AggregationBuilders.dateRange("day_range")
                    .field("day")
                    .format("yyyy-MM-dd")
                    .addRange("1", "2020-02-03")
                    .addRange("2", "2020-02-03", "2020-03-10")
                    .addRange("3", "2020-03-10");

SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
// 主要是避免索引不存在的问题
if (aggregations != null) {
  ParsedDateRange dateRange = aggregations.get("day_range");
}

 

三、Pipeline Aggregations(管道聚合):是基于其他聚合而非文档集所产生的输出,类似数据库分组后分页

  1. Bucket Sort:是对其父多桶聚合的桶进行排序

    A. URL:POST /sales/_search?size=0

    B. 请求参数

      from:设置值之前的位置的存储桶将被截断,默认值为0,注意分页需是size的整数倍

      size:要返回的存储桶数,默认为父聚合的所有存储桶;

      sort:定义排序结构,可以多字段

    C. Kibana查询:

    D. Java实现:

 

可参考:ES官网 聚合Aggregation

    ES官网 聚合 Java API

推荐阅读