首页 > 解决方案 > ElasticSearch - 结合过滤器和复合查询以获得独特的字段组合

问题描述

嗯..我对 ES 非常“新手”,所以关于聚合......字典中没有任何词可以描述我的水平:p

今天我面临一个问题,我试图创建一个查询,该查询应该执行类似于 SQL DISTINCT 的东西,但在过滤器之间。我有这个文档(当然,是对真实情况的抽象):

{
  "id": "1",
  "createdAt": 1626783747,
  "updatedAt": 1626783747,
  "isAvailable": true,
  "kind": "document",
  "classification": {
    "id": 1,
    "name": "a_name_for_id_1"
  },
  "structure": {
    "material": "cartoon",
    "thickness": 5
  },
  "shared": true,
  "objective": "stackoverflow"
}

由于上述文档的所有数据都可能有所不同,因此我有一些可能是多余的值,例如classification.id, kind, structure.material.

因此,为了满足我的要求,我想对这三个字段进行“分组”,以便每个字段都有一个独特的组合。如果我们更深入,使用以下数据,我应该得到以下可能性:

[{
        "id": "1",
        "createdAt": 1626783747,
        "updatedAt": 1626783747,
        "isAvailable": true,
        "kind": "document",
        "classification": {
            "id": 1,
            "name": "a_name_for_id_1"
        },
        "structure": {
            "material": "cartoon",
            "thickness": 5
        },
        "shared": true,
        "objective": "stackoverflow"
    },
    {
        "id": "2",
        "createdAt": 1626783747,
        "updatedAt": 1626783747,
        "isAvailable": true,
        "kind": "document",
        "classification": {
            "id": 2,
            "name": "a_name_for_id_2"
        },
        "structure": {
            "material": "iron",
            "thickness": 3
        },
        "shared": true,
        "objective": "linkedin"
    },
    {
        "id": "3",
        "createdAt": 1626783747,
        "updatedAt": 1626783747,
        "isAvailable": false,
        "kind": "document",
        "classification": {
            "id": 2,
            "name": "a_name_for_id_2"
        },
        "structure": {
            "material": "paper",
            "thickness": 1
        },
        "shared": false,
        "objective": "tiktok"
    },
    {
        "id": "4",
        "createdAt": 1626783747,
        "updatedAt": 1626783747,
        "isAvailable": true,
        "kind": "document",
        "classification": {
            "id": 3,
            "name": "a_name_for_id_3"
        },
        "structure": {
            "material": "cartoon",
            "thickness": 5
        },
        "shared": false,
        "objective": "snapchat"
    },
    {
        "id": "5",
        "createdAt": 1626783747,
        "updatedAt": 1626783747,
        "isAvailable": true,
        "kind": "document",
        "classification": {
            "id": 3,
            "name": "a_name_for_id_3"
        },
        "structure": {
            "material": "paper",
            "thickness": 1
        },
        "shared": true,
        "objective": "twitter"
    },
    {
        "id": "6",
        "createdAt": 1626783747,
        "updatedAt": 1626783747,
        "isAvailable": false,
        "kind": "document",
        "classification": {
            "id": 3,
            "name": "a_name_for_id_3"
        },
        "structure": {
            "material": "iron",
            "thickness": 3
        },
        "shared": true,
        "objective": "facebook"
    }
]

基于上述,我应该在“桶”中得到以下结果:

当然,为了这个例子(为了更容易,我还没有任何重复)

但是,最重要的是,我只需要一些“预过滤器”:

与第一组结果相比,我应该只得到以下组合:

如果你还在阅读,好吧..谢谢!xD

因此,正如您所看到的,我需要该字段的所有可能组合,这些组合kind <> classification_id <> structure_material与与过滤器有关的静态模式相匹配isAvailable, thickness, shared

关于输出,命中对我来说并不重要,因为我不需要文件,只需要组合kind <> classification_id <> structure_material:)

谢谢你的帮助 :)

最大限度

标签: elasticsearchelasticsearch-aggregation

解决方案


感谢一位同事,我终于可以按预期工作了!

询问

GET index-latest/_search
{
   "size": 0,
   "query": {
      "bool": {
         "filter": [
            {
               "term": {
                  "isAvailable": true
               }
            },
            {
               "range": {
                  "structure.thickness": {
                     "gte": 2,
                     "lte": 4
                  }
               }
            },
            {
               "term": {
                  "shared": true
               }
            }
         ]
      }
   },
   "aggs": {
      "my_agg_example": {
         "composite": {
            "size": 10,
            "sources": [
               {
                  "kind": {
                     "terms": {
                        "field": "kind.keyword",
                        "order": "asc"
                     }
                  }
               },
               {
                  "classification_id": {
                     "terms": {
                        "field": "classification.id",
                        "order": "asc"
                     }
                  }
               },
               {
                  "structure_material": {
                     "terms": {
                        "field": "structure.material.keyword",
                        "order": "asc"
                     }
                  }
               }
            ]
         }
      }
   }
}

那么给定的结果是:

{
   "took": 11,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "skipped": 0,
      "failed": 0
   },
   "hits": {
      "total": {
         "value": 1,
         "relation": "eq"
      },
      "max_score": null,
      "hits": []
   },
   "aggregations": {
      "my_agg_example": {
         "after_key": {
            "kind": "document",
            "classification_id": 2,
            "structure_material": "iron"
         },
         "buckets": [
            {
               "key": {
                  "kind": "document",
                  "classification_id": 2,
                  "structure_material": "iron"
               },
               "doc_count": 1
            }
         ]
      }
   }
}

因此,如我们所见,我们得到以下存储桶:

{
    "key": {
        "kind": "document",
        "classification_id": 2,
        "structure_material": "iron"
    },
    "doc_count": 1
}

注意:请注意您的字段类型..放置.keyword分类.id导致桶中没有结果....keyword应该仅用于字符串等类型(据我了解,如果我错了,请纠正我)

正如预期的那样,我们得到以下结果(与最初的问题相比):

  • 文件 2 铁

注意:请注意,在返回结果中元素的顺序aggs.<name>.composite.sources确实会起作用。

谢谢!


推荐阅读