首页 > 解决方案 > 是否可以在弹性搜索中计算“不同的总和”和“不同的平均值”?

问题描述

如何计算弹性搜索中的“不同平均值”?我有一些像这样的非规范化数据:

{ "record_id" : "100", "cost" : 42 }
{ "record_id" : "200", "cost" : 67 }
{ "record_id" : "200", "cost" : 67 }
{ "record_id" : "200", "cost" : 67 }
{ "record_id" : "400", "cost" : 11 }
{ "record_id" : "400", "cost" : 11 }
{ "record_id" : "500", "cost" : 10 }
{ "record_id" : "600", "cost" : 99 }

请注意,对于给定的“record_id”,“成本”总是相同的。

所以有了上面的数据:

  1. 如何获得“成本”字段的平均值,但通过“record_id”获得 DISTINCT?结果将是 (42+67+11+10+99)/5=45.8

  2. 如何获取“成本”字段的 SUM 值,但通过“record_id”获得 DISTINCT?结果将是 42+67+11+10+99=229

我可以使用“术语”聚合然后“第一”和“平均”子聚合的组合吗?我在想这样的事情:弹性搜索计算唯一值的平均值

标签: elasticsearchaggregate-functions

解决方案


它不适用于termsaggs。以下是使用无痛脚本的可能性:

索引 - 您的实际映射可能与生成的默认值不同(尤其是.keyword上的部分rec_id):

POST _bulk
{"index":{"_index":"uniques","_type":"_doc"}}
{"record_id":"100","cost":42}
{"index":{"_index":"uniques","_type":"_doc"}}
{"record_id":"200","cost":67}
{"index":{"_index":"uniques","_type":"_doc"}}
{"record_id":"200","cost":67}
{"index":{"_index":"uniques","_type":"_doc"}}
{"record_id":"200","cost":67}
{"index":{"_index":"uniques","_type":"_doc"}}
{"record_id":"400","cost":11}
{"index":{"_index":"uniques","_type":"_doc"}}
{"record_id":"400","cost":11}
{"index":{"_index":"uniques","_type":"_doc"}}
{"record_id":"500","cost":10}
{"index":{"_index":"uniques","_type":"_doc"}}
{"record_id":"600","cost":99}

然后聚合

GET uniques/_search
{
  "size": 0,
  "aggs": {
    "terms": {
      "scripted_metric": {
        "init_script": "state.id_map = [:]; state.sum = 0.0; state.elem_count = 0.0;",
        "map_script": """
          def id = doc['record_id.keyword'].value;
          if (!state.id_map.containsKey(id)) {
            state.id_map[id] = true;
            state.elem_count++;
            state.sum += doc['cost'].value;
          }
        """,
        "combine_script": """
            def sum = state.sum;
            def avg = sum / state.elem_count;

            def stats = [:];
            stats.sum = sum;
            stats.avg = avg;

            return stats
        """,
        "reduce_script": "return states"
      }
    }
  }
}

并屈服

...
"aggregations" : {
    "terms" : {
      "value" : [
        {
          "avg" : 45.8,
          "sum" : 229.0
        }
      ]
    }
  }

推荐阅读