首页 > 解决方案 > 如何在 date_histogram 聚合下对当天每个唯一 ID 的 Lastest Record 进行 TopHit 聚合?

问题描述

我有一组“10K id”和一个类似的文档类型(在实际情况下,我的文档数量为 500K,所以我将其简化如下)

{"id":"Peter","sales":12679, "time": "timestamp": "2021-04-22 13:03:46.972"}
{"id":"Peter","sales":12375, "time": "timestamp": "2021-04-21 13:03:46.972"}
{"id":"Peter","sales":32124, "time": "timestamp": "2021-04-20 17:03:46.972"}
{"id":"Peter","sales":12472, "time": "timestamp": "2021-04-20 13:03:46.972"}
{"id":"Peter","sales":42679, "time": "timestamp": "2021-04-18 14:03:46.972"}
{"id":"Peter","sales":12379, "time": "timestamp": "2021-04-18 13:03:46.972"}
....
{"id":"John","sales":2256679, "time": "timestamp": "2021-04-2 13:03:46.972"}
{"id":"John","sales" 752375, "time":  "timestamp": "2021-04-1 13:03:46.972"}
{"id":"John","sales":85124, "time":   "timestamp": "2021-04-10 17:03:46.972"}
{"id":"John","sales":1472, "time":    "timestamp": "2021-04-10 13:03:46.972"}
{"id":"John","sales":4279, "time":    "timestamp": "2021-04-18 14:03:46.972"}
{"id":"John","sales":2379, "time":    "timestamp": "2021-04-18 13:03:46.972"}
....

我想做一个查询来执行以下任务:

  1. 查找每天“每个id”的最新记录并且
  2. 计算每个“id”DIDNT 在任何一天有多少次“doc_count”

使用 Date Histogram + Top Hit agg + Uniquness 查找每个“ID”当天的最新销售,并且还检查其中任何一个 DIDNT 在任何一天都有 DOC 计数,我尝试了很多查询,但没有一个返回我想要的结果,

如下所示:

{
        "size": 0,
        "sort": {"timestamp": "desc"},
        "query": {
            "bool": {
                "must":
                 {
                     "terms": {
                         "id": ["Peter","John"]
                     }
                 }
            }
        },
        "aggs": {
            "sales_over_time": {
                "date_histogram": {
                    "field": "timestamp",
                    "calendar_interval": "1d"
                },
                 "aggs": {
                    "id": {
                    "terms": {
                        "field": "id.keyword"
                        }
                    }
                 }
            }
        }
}

返回类似的东西

 {
                    "key": 1615852800000,
                    "doc_count": 6,
                    "id": {
                        "doc_count_error_upper_bound": 0,
                        "sum_other_doc_count": 0,
                        "buckets": [
                            {
                                "key": "624232532",
                                "doc_count": 4
                            },
                            {
                                "key": "656625970",
                                "doc_count": 2
                            }
                        ]
                    }
                },

在此查询之后,我仍然需要检查其中一些条件是否在当天没有 doc_count 出现,

为了让对方获得每个 iD 当天的最新记录,我尝试,

{
        "size": 0,
        "sort": {"timestamp": "desc"},
        "query": {
            "bool": {
                "must":
                 {
                     "terms": {
                         "oneNetDevieId": [656625970,624232532,624232499]
                     }
                 }
            }
        },
        "aggs": {
            "sales_over_time": {
                "date_histogram": {
                    "field": "timestamp",
                    "calendar_interval": "1d"
                },
                "aggs": {
                    "name": {
                    "terms": { "field": "oneNetDevieId.keyword" },
                    "aggs": {
                        "latest_comment": {
                        "top_hits": {
                            "sort": [ {"timestamp": { "order": "desc" } } ],
                            "size": 1
                            }
                        }
                        }
                    }
                    }
                }
        }
}

我怎样才能对每天的每个 ID 进行求和?

标签: elasticsearch

解决方案


推荐阅读