首页 > 解决方案 > 如何获取嵌套字段的“缺失”聚合桶

问题描述

我试图在 ES 的嵌套聚合中获取一个“丢失的”存储桶。目的是返回有多少文档没有被设置为某个类别。

以下是一些(简化的)示例文档:

[
    {
      "doc_id": 1,
      "categories": [
          {
              "field": "100",
              "category": "10"
          }
      ]
    },
    {
      "doc_id": 2,
      "categories": [
          {
              "field": "200",
              "category": "10"
          },
          {
              "field": "300",
              "category": "20"
          }
      ]
    },
    {
      "doc_id": 3
    }
]

我有兴趣查看有多少文档具有一个类别,这些类别是什么以及选择了该类别中的某个字段的多少。因此,我运行这样的嵌套聚合查询:

"aggregations": {
    "category": {
      "nested": {
        "path": "categories"
      },
      "aggregations": {
        "category": {
          "terms": {
            "field": "categories.category",
            "size": 50,
            "shard_size": 2147483647,
            "min_doc_count": 1,
            "shard_min_doc_count": 0,
            "show_term_doc_count_error": false,
            "order": [
              {
                "_count": "desc"
              },
              {
                "_key": "asc"
              }
            ]
          },
          "aggregations": {
            "categories": {
              "terms": {
                "field": "categories.field",
                "size": 50,
                "shard_size": 2147483647,
                "min_doc_count": 1,
                "shard_min_doc_count": 0,
                "show_term_doc_count_error": false,
                "order": [
                  {
                    "_count": "desc"
                  },
                  {
                    "_key": "asc"
                  }
                ]
              }
            }
          }
        }
      }
    }
  }
  

这给了我们这样的回应:

"hits" : {
    "total" : 3,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "category" : {
      "doc_count" : 3, // This is the amount of categories set, so this can exceed the total hits
      "category" : {
        "doc_count_error_upper_bound" : 0,
        "sum_other_doc_count" : 0,
        "buckets" : [
          {
            "key" : "10", // Category id
            "doc_count" : 2, // Amount of documents set with this category
            "categories" : {
              "doc_count_error_upper_bound" : 0,
              "sum_other_doc_count" : 0,
              "buckets" : [
                {
                  "key" : "100", // Field id
                  "doc_count" : 1 // Amount of documents set with this field
                },
                {
                  "key" : "200",
                  "doc_count" : 1
                }
              ]
            }
          },
          {
            "key" : "20",
            "doc_count" : 1,
            "categories" : {
              "doc_count_error_upper_bound" : 0,
              "sum_other_doc_count" : 0,
              "buckets" : [
                {
                  "key" : "300",
                  "doc_count" : 1
                }
              ]
            }
          }
        ]
      }
    }
  }  

有没有办法包含一个包含未设置某些类别的文档数量的存储桶?所需响应的示例:

"hits" : {
    "total" : 3,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "category" : {
      "doc_count" : 3,
      "category" : {
        "doc_count_error_upper_bound" : 0,
        "sum_other_doc_count" : 0,
        "buckets" : [
          {
            "key" : "10",
            "doc_count" : 2,
            "categories" : {
              "doc_count_error_upper_bound" : 0,
              "sum_other_doc_count" : 0,
              "buckets" : [
                {
                  "key" : "100",
                  "doc_count" : 1
                },
                {
                  "key" : "200",
                  "doc_count" : 1
                },
                {
                  "key" : "Does not contain this category", // The "missing" bucket I wish to add
                  "doc_count" : 1
                }
              ]
            }
          },
          {
            "key" : "20",
            "doc_count" : 1,
            "categories" : {
              "doc_count_error_upper_bound" : 0,
              "sum_other_doc_count" : 0,
              "buckets" : [
                {
                  "key" : "300",
                  "doc_count" : 1
                },
                {
                  "key" : "Does not contain this category",
                  "doc_count" : 2
                }
              ]
            }
          }
        ]
      }
    }
  } 

我试图在 categories.category 和 categories.field 的术语聚合中设置“缺失”属性,但两者都没有按照我想要的方式工作。我还尝试在嵌套聚合内部和外部添加缺少的聚合,但它们总是会导致文档总数。另外,有没有查询没有特定类别的文档的好方法?

标签: elasticsearchnestedaggregation

解决方案


推荐阅读