首页 > 解决方案 > 窗口函数

问题描述

我有以下事件表。

key time_stamp  geohash
k1  1           thred0y
k2  5           thred0v
k4  7           thre6rd
k3  9           thre6rg
k1  10          thred3t
k1  12          thred3u
k2  14          thred3s

如果它们在 10 分钟的时间间隔内落在 500mts 范围内,我想将它们聚集成组。

我试着交叉加入他们

select a.key, b.key, a.geohash, b.geohash, a.time_stamp, b.time_stamp,
  round(ST_Distance(ST_PointFromGeoHash(a.geohash, 4326), ST_PointFromGeoHash(b.geohash, 4326), true)) distance,
  abs(round(extract(EPOCH from a.time_stamp - b.time_stamp)/60))
from t a, t b
where a.key <> b.key
  and a.time_stamp between b.time_stamp - interval '10 min' and b.time_stamp + interval '10 min'
  and ST_Distance(ST_PointFromGeoHash(a.geohash, 4326), ST_PointFromGeoHash(b.v, 4326), true) <= 500
  and least(a.key, b.key) = a.key
order by a.time_stamp desc

但是,该查询适用于小数据,此外,该查询仅在有两个不同的键但不超过 2 个时才有效。

有关如何进一步进行的任何输入都会有所帮助。

我添加了一些示例数据进行测试,https://pastebin.com/iVD1WU4Y

标签: postgresql

解决方案


我通过在 60 分钟内聚集键以及相距 1.2 公里找到了解决方案。

with x as (
select key, time_stamp, geo, prev_ts, geo_hash6,
 count(case when prev_ts is null or prev_ts > 60 then 1 else null end) over(order by time_stamp) cluster_id
from (
    select key, time_stamp, geo,  
        EXTRACT(EPOCH FROM time_stamp - lag(time_stamp) over(order by time_stamp)) prev_ts,
        substring(geo, 1, 6) geo_hash6
    from t
) a
order by cluster_id, geo_hash6, geo, time_stamp)
select x.cluster_id, x.key, x.geo_hash6, min(time_stamp) first_time, max(time_stamp) last_time
from x, (select cluster_id, geo_hash6, count(distinct key) num_uniques from x group by cluster_id, geo_hash6) y
where x.cluster_id = y.cluster_id and x.geo_hash6 = y.geo_hash6 and y.num_uniques > 2
group by x.cluster_id, x.geo_hash6, x.key
order by x.cluster_id, x.geo_hash6;

欢迎任何改进解决方案的建议。


推荐阅读