首页 > 解决方案 > SQL (Presto) - 当日期范围是连续的时“压缩”行

问题描述

我有这个数据(样本):

event_id    period_start    period_end  rating
100269      2/8/2016        6/30/2016   1
100269      6/30/2016       12/31/2016  1
100269      12/31/2016      6/30/2017   2
100269      6/30/2017       12/31/2017  2

period_start当句点( , period_end)立即连续并且评级相同时,我想“压缩”这些行。期望的输出是:

event_id    period_start    period_end  rating
100269      2/8/2016        12/31/2016  1
100269      12/31/2016      12/31/2017  2

请注意,在此数据集中,对于某些event_id. 这是一个示例和所需的输出:

event_id    period_start    period_end  rating
100300      2/8/2016        6/30/2016   1
100300      6/30/2016       12/31/2016  1
100300      6/30/2017       12/31/2017  1

期望的输出:

event_id    period_start    period_end  rating
100300      2/8/2016        12/31/2016  1
100300      6/30/2017       12/31/2017  1

period_end您可以通过测试前一行是否等于当前行来确定一个周期是否是直接连续的period_start(在整个数据集中都是如此,以识别直接连续的周期)。

认为这里有一个解决方案,GROUP BY但我没有看到。任何帮助都会很棒。谢谢!

标签: sqlpresto

解决方案


with a as (
    select *,
        case when lag(period_end) over (partition by event_id, rating order by period_start) = period_start
           then 0 else 1 end as brk
    from T
) b as (
    select *,
        sum(brk) over (partition by event_id, rating order by period_start) as grp
    from a
)
select event_id, min(period_start) as period_start, max(period_end) as period_end, rating
from b
group by event_id, grp, rating
order by event_id, grp, rating

确定哪些行是系列中的中断,将它们标记为 1。通过计算中断对组进行编号,即运行总数。用于group by折叠成单行。


推荐阅读