sql - SQL/BigQuery:如何避免对一个组的多个非连续成员进行分组?
问题描述
我遇到了一个我自己似乎无法解决的问题。我按位置和时间戳对行进行分组,并为实体保持静止的实例查找第一个和最后一个时间戳。问题是对于我当前的代码,当实体返回到它之前的位置时,SQL 将行组合在一起。
在我的示例中,实体在 2020-05-24 05:22:00 位于位置 -66.89 10.5002,然后在 2020-05-24 11:13:00 返回到该位置。当前查询的结果使该实体看起来一直在该位置,尽管中间的行清楚地表明它已移动。这是一个概念问题,我真的不知道如何在 SQL 中解决。我在 Big Query 中执行此操作,但我记得在 SQL Server 中遇到了类似的问题。
代码:
with selection as (
select 1 as id,TIMESTAMP '2020-05-24 11:13:00' as timestamp_, 'POINT(-66.89 10.5002)' as geom
union all select
1,TIMESTAMP '2020-05-24 05:22:00','POINT(-66.89 10.5002)'
union all select
1,TIMESTAMP '2020-05-24 05:25:00','POINT(-66.8881 10.4994)'
union all select
1,TIMESTAMP '2020-05-24 09:14:00','POINT(-66.8888 10.4958)'
union all select
1,TIMESTAMP '2020-05-24 07:37:00 UTC','POINT(-66.8873 10.5)'
union all select
1, TIMESTAMP'2020-05-24 07:52:00 UTC','POINT(-66.8873 10.5)'
)
select id,timestamp_,geom,
first_value(timestamp_)
OVER (PARTITION BY id,geom ORDER BY timestamp_ ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS interval_start,
last_value(timestamp_)
OVER (PARTITION BY id,geom ORDER BY timestamp_ ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS interval_end,
FROM
selection order by id,timestamp_
结果。注意第一行和最后一行的 interval_start 和 interval_end
ID | 时间戳_ | 几何 | 间隔开始 | 间隔结束 |
---|---|---|---|---|
1 | 2020-05-24 05:22:00 UTC | 点(-66.89 10.5002) | 2020-05-24 05:22:00 UTC | 2020-05-24 11:13:00 UTC |
1 | 2020-05-24 05:25:00 UTC | 点(-66.8881 10.4994) | 2020-05-24 05:25:00 UTC | 2020-05-24 05:25:00 UTC |
1 | 2020-05-24 07:37:00 UTC | 点(-66.8873 10.5) | 2020-05-24 07:37:00 UTC | 2020-05-24 07:52:00 UTC |
1 | 2020-05-24 07:52:00 UTC | 点(-66.8873 10.5) | 2020-05-24 07:37:00 UTC | 2020-05-24 07:52:00 UTC |
1 | 2020-05-24 09:14:00 UTC | 点(-66.8888 10.4958) | 2020-05-24 09:14:00 UTC | 2020-05-24 09:14:00 UTC |
1 | 2020-05-24 11:13:00 UTC | 点(-66.89 10.5002) | 2020-05-24 05:22:00 UTC | 2020-05-24 11:13:00 UTC |
期望的结果:
ID | 时间戳_ | 几何 | 间隔开始 | 间隔结束 |
---|---|---|---|---|
1 | 2020-05-24 05:22:00 UTC | 点(-66.89 10.5002) | 2020-05-24 05:22:00 UTC | 2020-05-24 05:22:00 UTC |
1 | 2020-05-24 05:25:00 UTC | 点(-66.8881 10.4994) | 2020-05-24 05:25:00 UTC | 2020-05-24 05:25:00 UTC |
1 | 2020-05-24 07:37:00 UTC | 点(-66.8873 10.5) | 2020-05-24 07:37:00 UTC | 2020-05-24 07:52:00 UTC |
1 | 2020-05-24 07:52:00 UTC | 点(-66.8873 10.5) | 2020-05-24 07:37:00 UTC | 2020-05-24 07:52:00 UTC |
1 | 2020-05-24 09:14:00 UTC | 点(-66.8888 10.4958) | 2020-05-24 09:14:00 UTC | 2020-05-24 09:14:00 UTC |
1 | 2020-05-24 11:13:00 UTC | 点(-66.89 10.5002) | 2020-05-24 11:13:00 UTC | 2020-05-24 11:13:00 UTC |
解决方案
考虑下面
with selection as (
select 1 as id,TIMESTAMP '2020-05-24 11:13:00' as timestamp_, 'POINT(-66.89 10.5002)' as geom union all select
1,TIMESTAMP '2020-05-24 05:22:00','POINT(-66.89 10.5002)' union all select
1,TIMESTAMP '2020-05-24 05:25:00','POINT(-66.8881 10.4994)' union all select
1,TIMESTAMP '2020-05-24 09:14:00','POINT(-66.8888 10.4958)' union all select
1,TIMESTAMP '2020-05-24 07:37:00 UTC','POINT(-66.8873 10.5)' union all select
1, TIMESTAMP'2020-05-24 07:52:00 UTC','POINT(-66.8873 10.5)'
), pregrouped_selection as (
select id, timestamp_, geom,
countif(flag) over(partition by id order by timestamp_) grp
from (
select id, timestamp_, geom,
geom != ifnull(lag(geom) over(partition by id order by timestamp_), geom) flag,
from selection
)
order by id, timestamp_
)
select id,timestamp_,geom,
first_value(timestamp_)
OVER (PARTITION BY id,grp ORDER BY timestamp_ ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS interval_start,
last_value(timestamp_)
OVER (PARTITION BY id,grp ORDER BY timestamp_ ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS interval_end,
FROM
pregrouped_selection order by id,timestamp_
带输出
如您所见 - 我几乎 100% 保留了您的原始查询 - 只是替换geom
为grp
内部over()
语句 AND 从中pregrouped_selection
计算组号 - grp
推荐阅读
- javascript - 如何将带有数组的 JSON 对象导入 ajax 自动完成响应?
- bash - 需要在文件中连续打印当前的 CPU 使用率和内存使用率
- ruby-on-rails - Figaro 没有创建“application.yml”文件
- python - 如何在窗口中为 Tkinter 构建一个包,如 exe?
- php - PHP用自定义文本替换了一个标签不起作用
- angular - Angular 6我必须在客户端生成jwt以匹配服务器还是只匹配服务器
- angular - 使用 PrimeNG 和 Material 组件验证 Angular 表单
- sql-server - SQL Server 连接用户在使用“127.0.0.1”和“localhost”时会有所不同。为什么?
- sql-server - 为什么 Jet-SQL 转 T-SQL 时不能使用 format 函数?
- python - 来自 scipy.misc 的 imread 不适用于 pycharm。我似乎也无法安装 PIL