sql - Hive SQL - 获取重叠日期的最小和最大日期
问题描述
我有以下带有帐号和日期的数据集
Account Start Date End_Date
---------------------------------------
1111222333 05/01/2016 15/02/2016
1111222333 29/01/2016 04/04/2016
1111222333 20/03/2016 13/05/2016
1111222333 26/04/2016 06/06/2016
1111222333 05/05/2016 06/06/2016
1111222333 13/09/2016 10/10/2016
1111222333 14/10/2016 15/12/2016
1111222333 09/08/2017 25/08/2017
1111222333 25/10/2017 10/11/2017
1111222333 02/11/2017 05/01/2018
大多数日期范围都是重叠的,我需要重叠范围内的最小开始日期和最大结束日期,因此输出应如下所示:
Account Start Date End_Date
----------------------------------
1111222333 05/01/2016 06/06/2016
1111222333 13/09/2016 10/10/2016
1111222333 14/10/2016 15/12/2016
1111222333 09/08/2017 25/08/2017
1111222333 25/10/2017 05/01/2018
我是一个新手用户,并意识到上述内容远远超出了我目前的能力,所以交给各位专家:)
解决方案
这是一个差距和孤岛问题。解决它的一种方法是获取所有日期的列表并跟踪适用的“记录”数量。然后,一些窗口函数和聚合解决了这个问题。
with d as (
select account, startdate as dte, 1 as inc
from t
union all
select account, enddate as dte, -1 as inc
from t
),
ds as ( -- accumulate "inc". Rows with "0" are ends of islands
select account, dte, sum(running_inc) over (partition by account order by dte) as running_inc
from (select account, dte, sum(inc) as running_inc
from d
group by account, dte
) d
),
g as ( -- assign group
select account, dte,
sum(case when running_inc = 0 then 1 else 0 end) over (partition by account order by dte desc) as grp
from ds
)
select account, min(dte) as start_date, max(dte) as end_dte
from g
group by account, grp;
这是一个rextester(使用 Postgres)。
这是rextester中的代码:
with t(account, startdate, enddate) as (
select 1111222333, '2016-01-05'::date, '2016-02-15'::date union all
select 1111222333, '2016-01-29'::date, '2016-04-04'::date union all
select 1111222333, '2016-03-20'::date, '2016-05-13'::date union all
select 1111222333, '2016-04-26'::date, '2016-06-06'::date union all
select 1111222333, '2016-05-05'::date, '2016-06-06'::date union all
select 1111222333, '2016-09-13'::date, '2016-10-10'::date union all
select 1111222333, '2016-10-14'::date, '2016-12-15'::date union all
select 1111222333, '2017-08-09'::date, '2017-08-25'::date union all
select 1111222333, '2017-10-25'::date, '2017-11-10'::date union all
select 1111222333, '2017-11-02'::date, '2018-01-05'::date
),
d as (
select account, startdate as dte, 1 as inc
from t
union all
select account, enddate as dte, -1 as inc
from t
),
ds as ( -- accumulate "inc". Rows with "0" are ends of islands
select account, dte, sum(running_inc) over (partition by account order by dte) as running_inc
from (select account, dte, sum(inc) as running_inc
from d
group by account, dte
) d
),
g as ( -- assign group
select account, dte,
sum(case when running_inc = 0 then 1 else 0 end) over (partition by account order by dte desc) as grp
from ds
)
select account, min(dte) as start_date, max(dte) as end_dte
from g
group by account, grp;
推荐阅读
- android - Prorammatically 创建的布局在横向模式下使应用程序崩溃?
- xcode - Xcode 11 中缺少显示助理编辑器?
- java - Spring Security SAML 扩展的配置错误:“无法实例化 [org.springframework.security.saml.SAMLBootstrap]”
- elasticsearch - 如何将弹性搜索从 1.7.x 升级到 7.0?
- objective-c - 如何在不同 NSObject 中定义的 NSObject 中引用 IBOutlet
- javascript - 如何在 SVG 圆环图中旋转圆圈?
- python-2.7 - 分布式 Tensorflow 错误:检查失败:DeviceNameUtils::ParseFullName(new_base, &parsed_name)
- asp.net-core - 如何正确重定向到asp.netcore中的页面
- sql - 如何将字符串的特定部分解析为sql中的各个列?
- jenkins - 有没有办法将部署锁定为仅在一天中的特定时间进行?