首页 > 解决方案 > Elasticsearch - 按最后一个时间戳聚合,加上过滤、排序和分页

问题描述

我有一个 Elasticsearch 索引,其中包含具有以下结构的条目:

{
    // id of the entry
    "EntryId": integer,

    // The timestamp of the entry being processed, usually there are multiple entries with different "ChangeDate" for the same "EntryId"
    // formatted as a UTC date, always in the same timezone
    "ChangeDate": date

    // Other fields
    "Field1": integer,
    "Field2": text,
    ...
}

现在我想要实现的是能够查询条目并同时拥有以下所有内容:

以下是我的解决方案,里面有评论。但是,这种方案的局限性在于,top_hits 中的分页默认只允许 100 个 top 结果。虽然我仍然可以通过设置 [index.max_inner_result_window] 来覆盖它。但这仍然是一个硬限制,用户将无法分页到任意结果集的末尾。

我的问题是:

解决方案:

GET /index-1/_search
{
    "size": 0,
    // This sorting prepares the correct order for the following "collapse" below, i.e. last ChangeDate goes first
    "sort": [{ "ChangeDate": "desc" }],
    "query" : {
        // This allows filtering on arbitrary fields
        "bool": {
            "must": [
              {"range": {"Field1": {"gte": 30000}}}
      ]
    }},
    // This chooses only one entry with the latest ChangeDate among all entries with the same EntryId
    "collapse" : {
        "field" : "EntryId"
    },
    "aggs": {
        "TopHits": {
            "top_hits": {
                // This forces sorting of the result set first by Field1 then by EntryId
                "sort": [{"Field1": {"order": "asc"}}, {"EntryId": {"order": "asc"}}],
                // This is result set pagination
                "from": 60,
                "size": 10,
                // This includes the whole entry in the result source
                "_source": { "includes": ["*"]}
            }
        }
    }
}

标签: elasticsearch

解决方案


推荐阅读