elasticsearch - 如何获取嵌套字段的“缺失”聚合桶
问题描述
我试图在 ES 的嵌套聚合中获取一个“丢失的”存储桶。目的是返回有多少文档没有被设置为某个类别。
以下是一些(简化的)示例文档:
[
{
"doc_id": 1,
"categories": [
{
"field": "100",
"category": "10"
}
]
},
{
"doc_id": 2,
"categories": [
{
"field": "200",
"category": "10"
},
{
"field": "300",
"category": "20"
}
]
},
{
"doc_id": 3
}
]
我有兴趣查看有多少文档具有一个类别,这些类别是什么以及选择了该类别中的某个字段的多少。因此,我运行这样的嵌套聚合查询:
"aggregations": {
"category": {
"nested": {
"path": "categories"
},
"aggregations": {
"category": {
"terms": {
"field": "categories.category",
"size": 50,
"shard_size": 2147483647,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
},
"aggregations": {
"categories": {
"terms": {
"field": "categories.field",
"size": 50,
"shard_size": 2147483647,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [
{
"_count": "desc"
},
{
"_key": "asc"
}
]
}
}
}
}
}
}
}
这给了我们这样的回应:
"hits" : {
"total" : 3,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"category" : {
"doc_count" : 3, // This is the amount of categories set, so this can exceed the total hits
"category" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "10", // Category id
"doc_count" : 2, // Amount of documents set with this category
"categories" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "100", // Field id
"doc_count" : 1 // Amount of documents set with this field
},
{
"key" : "200",
"doc_count" : 1
}
]
}
},
{
"key" : "20",
"doc_count" : 1,
"categories" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "300",
"doc_count" : 1
}
]
}
}
]
}
}
}
有没有办法包含一个包含未设置某些类别的文档数量的存储桶?所需响应的示例:
"hits" : {
"total" : 3,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"category" : {
"doc_count" : 3,
"category" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "10",
"doc_count" : 2,
"categories" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "100",
"doc_count" : 1
},
{
"key" : "200",
"doc_count" : 1
},
{
"key" : "Does not contain this category", // The "missing" bucket I wish to add
"doc_count" : 1
}
]
}
},
{
"key" : "20",
"doc_count" : 1,
"categories" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "300",
"doc_count" : 1
},
{
"key" : "Does not contain this category",
"doc_count" : 2
}
]
}
}
]
}
}
}
我试图在 categories.category 和 categories.field 的术语聚合中设置“缺失”属性,但两者都没有按照我想要的方式工作。我还尝试在嵌套聚合内部和外部添加缺少的聚合,但它们总是会导致文档总数。另外,有没有查询没有特定类别的文档的好方法?
解决方案
推荐阅读
- css - 如何覆盖 ng2-date-picker 的输入 CSS 样式
- python - python列表上的ansible循环
- openid-connect - signinSilent 不起作用(openID Connect)
- javascript - TypeError:无法读取未定义的属性“groupsUrl”
- c++ - 未定义符号:operator << ( Xcode , subject Template);
- r - 如何检查字符向量是否包含字符串
- react-native - TextInput 事件 onContentSizeChange 未触发
- json - Flutter:复杂的 json 序列化。解析 json 字符串以提供给 UI
- python - GradientTape 根据是否被 tf.function 修饰的损失函数给出不同的梯度
- python - 低通滤波器从熊猫数据帧中获取数据的二阶导数