sql - 如何使用 SQL 获取列中每个分区的第一个和最后一个值
问题描述
我的数据集如下所示。
ts c1 c2 c3
2019-01-04T01:50:00.000Z C 25.48801612854004 33.317527770996094
2019-01-04T01:51:00.000Z C 25.74610710144043 33.392295837402344
2019-01-04T01:52:00.000Z C 25.978872299194336 33.29177474975586
2019-01-04T01:53:00.000Z B 26.12158203125 33.2805061340332
2019-01-04T01:54:00.000Z B 26.28511619567871 33.26923751831055
2019-01-04T01:55:00.000Z C 26.470335006713867 33.25796890258789
2019-01-04T01:56:00.000Z C 26.63957977294922 33.24669647216797
2019-01-04T01:57:00.000Z C 26.954004287719727 33.23542785644531
2019-01-04T01:58:00.000Z C 27.08258056640625 33.224159240722656
2019-01-04T01:59:00.000Z A 27.25551986694336 33.212890625
2019-01-04T02:00:00.000Z A 27.514263153076172 33.201622009277344
2019-01-04T02:01:00.000Z A 27.588970184326172 33.17148971557617
2019-01-04T02:02:00.000Z B 27.727638244628906 33.13819122314453
2019-01-04T02:03:00.000Z B 27.956039428710938 33.104896545410156
2019-01-04T02:04:00.000Z B 28.152463912963867 33.10499954223633
我想为“c1”列中的每个分区值获取“ts”的第一个和最后一个值。我已经尝试了以下查询,但它没有返回正确的结果。
SELECT ts, c1, c2, c3,
first_value(ts) OVER (partition by c1 order by ts
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as first,
last_value(ts) OVER (partition by c1 order by ts
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as last
FROM `default`.`a07_a15`
问题:第一个值仅返回三个不同的 ts 值,而最大值返回完全错误。
预期:我需要每个重复分区值的第一个和最后一个值。
ts c1 c2 c3 first last
2019-01-04T01:50:00.000Z C 25.48801612854004 33.317527770996094 2019-01-04T01:50:00.000Z 2019-01-04T01:52:00.000Z
2019-01-04T01:51:00.000Z C 25.74610710144043 33.392295837402344 2019-01-04T01:50:00.000Z 2019-01-04T01:52:00.000Z
2019-01-04T01:52:00.000Z C 25.978872299194336 33.29177474975586 2019-01-04T01:50:00.000Z 2019-01-04T01:52:00.000Z
2019-01-04T01:53:00.000Z B 26.12158203125 33.2805061340332 2019-01-04T01:53:00.000Z 2019-01-04T01:54:00.000Z
2019-01-04T01:54:00.000Z B 26.28511619567871 33.26923751831055 2019-01-04T01:53:00.000Z 2019-01-04T01:54:00.000Z
2019-01-04T01:55:00.000Z C 26.470335006713867 33.25796890258789 2019-01-04T01:55:00.000Z 2019-01-04T01:58:00.000Z
2019-01-04T01:56:00.000Z C 26.63957977294922 33.24669647216797 2019-01-04T01:55:00.000Z 2019-01-04T01:58:00.000Z
2019-01-04T01:57:00.000Z C 26.954004287719727 33.23542785644531 2019-01-04T01:55:00.000Z 2019-01-04T01:58:00.000Z
2019-01-04T01:58:00.000Z C 27.08258056640625 33.224159240722656 2019-01-04T01:55:00.000Z 2019-01-04T01:58:00.000Z
2019-01-04T01:59:00.000Z A 27.25551986694336 33.212890625 2019-01-04T01:59:00.000Z 2019-01-04T02:01:00.000Z
2019-01-04T02:00:00.000Z A 27.514263153076172 33.201622009277344 2019-01-04T01:59:00.000Z 2019-01-04T02:01:00.000Z
2019-01-04T02:01:00.000Z A 27.588970184326172 33.17148971557617 2019-01-04T01:59:00.000Z 2019-01-04T02:01:00.000Z
2019-01-04T02:02:00.000Z B 27.727638244628906 33.13819122314453 2019-01-04T02:02:00.000Z 2019-01-04T02:04:00.000Z
2019-01-04T02:03:00.000Z B 27.956039428710938 33.104896545410156 2019-01-04T02:02:00.000Z 2019-01-04T02:04:00.000Z
2019-01-04T02:04:00.000Z B 28.152463912963867 33.10499954223633 2019-01-04T02:02:00.000Z 2019-01-04T02:04:00.000Z
解决方案
使用lag()
和lead()
:
select t.*
from (select t.*,
lag(c1) over (order by ts) as prev_c1,
lead(c1) over (order by ts) as next_c1
from t
) t
where prev_c1 is null or next_c1 is null or
prev_c1 <> c1 or next_c1 <> c1;
这会将值放在不同的行中。如果您希望它们在同一行中,可能将其视为间隙和岛屿问题是最简单的解决方案:
select c1, min(ts), max(ts)
from (select t.*,
row_number() over (order by ts) as seqnum,
row_number() over (partition by c1 order by ts) as seqnum_2
from t
) t
group by c1, (seqnum - seqnum_2);
编辑:
如果您需要保留原始行,只需使用窗口函数:
select t.*,
min(ts) over (partition by c1, (seqnum - seqnum2)) as min_ts,
max(ts) over (partition by c1, (seqnum - seqnum2)) as max_ts
from (select t.*,
row_number() over (order by ts) as seqnum,
row_number() over (partition by c1 order by ts) as seqnum_2
from t
) t
推荐阅读
- c# - ASP.NET MVC 中的自定义 CustomAuthenticationAttribute 依赖参数
- python - 在python中打印树中的根和孩子
- c - C 将结构传递给回调函数 (Tizen)
- javascript - 如何删除 Instagram 徽标旁边的小黑线?
- c++ - 我无法从文件中读取
- javascript - 为什么 React Component 渲染计数器增加 2?
- javascript - 如何修复未定义的不是对象(评估'firebase.apps.length')
- angular - 部署和托管 Laravel Rest api 以及 Angular App 和 mysql DB
- reactjs - 蚂蚁设计怎么了?从 GitHub 中删除
- firebase - 如何在flutter firebase中进行全文搜索