首页 > 解决方案 > 从重叠的日期范围中获取不同的连续日期范围

问题描述

我需要从重叠日期列表中获取不相互重叠的日期范围列表,并获取重叠期间的硬币总和。我试过用谷歌搜索一个例子,但到目前为止还没有运气。我可能没有使用正确的关键词?

我有一个重叠日期列表

1.1.2018 - 31.1.2018 80
7.1.2018 - 10.1.2018 10
7.1.2018 - 31.1.2018 10
11.1.2018 - 31.1.2018 5
25.1.2018 - 27.1.2018 5
2.2.2018 - 23.2.2018 100

期望的结果是

1.1.2018 - 6.7.2018 80 coins
7.1.2018 - 10.1.2018 100 coins
11.1.2018 - 24.1.2018 95 coins
25.1.2018 - 27.1.2018 100 coins
28.1.2018 - 31.1.2018 95 coins
2.2.2018 - 23.2.2018 100 coins

这是一个应该如何工作的图

|------------------------------|
       |---|
       |-----------------------|
           |-------------------|
                      |---|
                                   |----------------------|
Outcome              
|------|---|----------|---|----|   |----------------------|
   80   100     95     100  95                100

这是我的测试数据

drop table coinsonperiod2;
create table coinsonperiod2(
  id serial,
  startdate date,
  enddate date,
  coins integer,
  userid integer
);
insert into coinsonperiod2 (startdate, enddate, coins,userid) values
  ('2018-01-01','2018-01-31', 80,1)
, ('2018-01-07','2018-01-10', 10,1)
, ('2018-01-07','2018-01-31', 10,1)
, ('2018-01-11','2018-01-31', 5,1)
, ('2018-01-25','2018-01-27', 5,1)
, ('2018-02-02','2018-02-23', 100,2)
, ('2018-01-01','2018-01-31', 80,2)
, ('2018-01-07','2018-01-10', 10,2)
, ('2018-01-07','2018-01-31', 10,2)
, ('2018-01-11','2018-01-31', 5,2)
, ('2018-01-25','2018-01-27', 5,2)
, ('2018-02-02','2018-02-23', 100,3)
; 

更新:实际上 StephenM 和 joops 的答案不符合我想要的结果。两个答案都显示 enddate 错误。

当一个时期结束时,下一个时期应该在第二天开始(或者如果有间隔则更晚)。在我想要的结果中,1.1.2018-6.1.2018 包括第 6 天。6th 和 7th 之间没有差距,因为 7th 包含在 7.1.2018-10.1.2018 中。

UPDATE2:现在我明白了开放、半开放和封闭间隔之间的区别。在 joops 解决方案中,必须针对半开区间进行计算,但我想要的结果是闭区间。这就是为什么必须减少 enddate 以使结果为闭区间。如果我错了,请纠正我。

我还在示例数据中添加了用户 ID,并进一步修改了 joops 解决方案。这是给我想要的结果的查询。

with changes AS (
  SELECT
    userid,
    startdate AS tickdate,
    coins,
    1         AS cover
  FROM coinsonperiod2
  UNION ALL
  -- add 1 day to correct intervals into half open intervals, so the calculation is correct
  SELECT
    userid,
    1 + enddate AS tickdate,
    -1 * coins,
    -1          AS cover
  FROM coinsonperiod2
)
, sumchanges  AS (
    SELECT
      userid,
      tickdate,
      SUM(coins) AS change,
      SUM(cover) AS cover
    FROM changes
    GROUP BY tickdate, userid
)
, aggregated AS (
    SELECT
      userid   AS userid,
      tickdate AS startdate,
      lead(tickdate)
      over www AS enddate,
      sum(change)
      OVER www AS cash,
      sum(cover)
      OVER www AS cover
    FROM sumchanges
    WINDOW www AS (
      partition by userid
      ORDER BY tickdate )
)
-- reduce 1 day from the enddate to make closed interval
SELECT
userid
, startdate
, enddate-1 as enddate
, cash
, cover
FROM aggregated
WHERE cover > 0
ORDER BY userid, startdate
;

结果: 结果

标签: sqlpostgresqloverlap

解决方案


逻辑是:

  • 在间隔开始时将其值添加到累积总和中
  • 在间隔结束时从这个总和中减去它的值
  • 但为了扫描日期线,我们必须收集所有(唯一的)日期/时间戳,无论是开始还是停止。

所以重点是:将数据从一系列间隔转换为一系列(开始/停止)事件,并聚合这些。


-- \i tmp.sql

create table coinsonperiod(
  id serial,
  startdate date,
  enddate date,
  coins integer
);
insert into coinsonperiod (startdate, enddate, coins) values
  ('2018-01-01','2018-01-31', 80)
, ('2018-01-07','2018-01-10', 10)
, ('2018-01-07','2018-01-31', 10)
, ('2018-01-11','2018-01-31', 5)
, ('2018-01-25','2018-01-27', 5)
, ('2018-02-02','2018-02-23', 100)
        ;

WITH changes AS (
    SELECT startdate AS tickdate , coins
            , 1 AS cover
    FROM coinsonperiod
    UNION ALL
    -- add 1 day to convert to half-open intervals
    SELECT 1+enddate AS tickdate, -1* coins
            , -1 AS cover
    FROM coinsonperiod
    )
, sumchanges  AS (
        SELECT tickdate, SUM(coins) AS change, SUM(cover) AS cover
        FROM changes
        GROUP BY tickdate
        )
, aggregated AS (
        SELECT
        tickdate AS startdate
        , lead(tickdate) over www AS enddate
        , sum(change) OVER www AS cash
          -- number of covered intervals
        , sum(cover) OVER www AS cover
        FROM sumchanges
        WINDOW www AS (ORDER BY tickdate)
        )
             -- substract one day from enddate to correct back to closed intervals
SELECT startdate, enddate-1 AS enddate, cash, cover
FROM aggregated
WHERE cover > 0
ORDER BY startdate
        ;

推荐阅读