首页 > 解决方案 > 分析事件发生的 SQL 时间序列

问题描述

我正在尝试为 PL/SQL 中的特定时间序列分析提出一个解决方案。我将使用一个非常简单的数据集来解释这个场景。在 COMP_TEST 表中,只要在 EVENT_START_DATE 和 EVENT_END_DATE 之间存在单独的日历日 (CAL_DATE),事件就会获得为其创建的行。在 EVENT_ID 1772349 的情况下,我们会得到 10/01/2020 – 10/24/2020 的行,因为那是开始和结束范围。在每个给定的日历日,某些事件的结果在 EVENT_RESULT 列中记录为“是”或“否”。

检查要求:

注意:此处的 7 个日历日应包括开始日期和结束日期。

在 EVENT_ID 1772349 的情况下,第一个 CAL_DATE 01-OCT-20 的结果为“是”,因此接下来的 7 个日历日评估将从 02-OCT-20(第 1 天)开始,到 08-OCT-20(第 7 天)结束) 其中记录了“是”结果。第二次评估将从 09-OCT-20 开始,这恰好也是“是”结果,因此下一次计数实际上将从 10-OCT-20 开始,结果为“否”并继续为“否”直到16-OCT-20 连续 7 天。然后第三次评估将从 17-OCT-20 开始,之后在 18-OCT-20 只有一个额外的“否”结果,然后从 19-OCT-20 到 23-OCT-20 的所有日子都是“是”结果。最后,在 20 年 10 月 24 日活动的最后一个日历日,我们击中了一个“否”,这将被归类为缺陷。

我在下面包含了创建表脚本、我尝试过的查询以及预期的输出。由于这听起来很适合差距和孤岛问题,所以我从这种方法开始,但不知道如何解决其余的要求。

表创建脚本:

CREATE TABLE COMP_TEST ( 
event_id INT 
,cal_date DATE 
,event_result varchar2(5) 
,event_start_date DATE 
,event_end_date DATE 
)

表格插入:

INSERT ALL 
 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (1772349,TO_DATE('10/01/2020','MM/DD/YYYY'),'yes',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY')) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (1772349,TO_DATE('10/02/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY')) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (1772349,TO_DATE('10/03/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY')) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (1772349,TO_DATE('10/04/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY')) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (1772349,TO_DATE('10/05/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY')) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (1772349,TO_DATE('10/06/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY')) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (1772349,TO_DATE('10/07/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY')) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (1772349,TO_DATE('10/08/2020','MM/DD/YYYY'),'yes',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY')) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (1772349,TO_DATE('10/09/2020','MM/DD/YYYY'),'yes',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY')) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (1772349,TO_DATE('10/10/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY')) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (1772349,TO_DATE('10/11/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY')) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (1772349,TO_DATE('10/12/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY')) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (1772349,TO_DATE('10/13/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY')) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (1772349,TO_DATE('10/14/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY')) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (1772349,TO_DATE('10/15/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY')) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (1772349,TO_DATE('10/16/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY')) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (1772349,TO_DATE('10/17/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY')) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (1772349,TO_DATE('10/18/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY')) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (1772349,TO_DATE('10/19/2020','MM/DD/YYYY'),'yes',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY')) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (1772349,TO_DATE('10/20/2020','MM/DD/YYYY'),'yes',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY')) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (1772349,TO_DATE('10/21/2020','MM/DD/YYYY'),'yes',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY')) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (1772349,TO_DATE('10/22/2020','MM/DD/YYYY'),'yes',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY')) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (1772349,TO_DATE('10/23/2020','MM/DD/YYYY'),'yes',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY')) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (1772349,TO_DATE('10/24/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY')) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('10/15/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('10/16/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('10/17/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('10/18/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('10/19/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('10/20/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('10/21/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('10/22/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('10/23/2020','MM/DD/YYYY'),'yes',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('10/24/2020','MM/DD/YYYY'),'yes',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('10/25/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('10/26/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('10/27/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('10/28/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('10/29/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('10/30/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('10/31/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('11/01/2020','MM/DD/YYYY'),'yes',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('11/02/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('11/03/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('11/04/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('11/04/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('11/05/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('11/06/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('11/07/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('11/08/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('11/09/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('11/10/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('11/11/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('11/12/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('11/13/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL) 
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('11/14/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST  (event_id, cal_date, event_result, event_start_date, event_end_date)    VALUES (5499345,TO_DATE('11/15/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
SELECT * FROM DUAL

我试过的查询:

WITH DAILY_RESULT AS ( 
SELECT 
    event_id 
    , cal_date 
    , CASE 
        WHEN event_result = 'yes' THEN 'compliant'  
        WHEN event_result = 'no' THEN 'deficient'  
        END AS status 
FROM COMP_TEST 
), 
 
RANKINGS AS ( 
SELECT 
    event_id 
    , cal_date 
    , status AS status 
    , DENSE_RANK() OVER (PARTITION BY event_id ORDER BY cal_date) - 
    DENSE_RANK() OVER (PARTITION BY event_id, status ORDER BY cal_date) 
        AS sequence_grouping 
FROM DAILY_RESULT 
ORDER BY cal_date ASC 
) 
 
SELECT 
    event_id 
    , MIN(cal_date) AS start_date 
    , MAX(cal_date) AS end_date 
    , MAX(cal_date) - MIN(cal_date) AS duration 
     
FROM RANKINGS 
WHERE status = 'deficient' 
 
GROUP BY  
    event_id 
    , sequence_grouping 

HAVING MAX(cal_date) - MIN(cal_date) >=7     

期望的输出:

EVENT_ID - START_DATE - END_DATE
1772349 -- 10/10/2020 -- 10/16/2020
1772349 -- 10/24/2020 -- 10/24/2020
5499345 -- 10/15/2020 -- 10/21/2020
5499345 -- 10/25/2020 -- 10/31/2020
5499345 -- 11/02/2020 -- 11/08/2020
5499345 -- 11/09/2020 -- 11/15/2020

标签: plsqltime-series

解决方案


我认为这是你想要的,但我无法弄清楚你在deficient_cnt. 此外,您的问题可能会使用一些格式,因为它真的很难阅读和理解要求。感谢您提供 aCREATE TABLE和 DML 语句。

好的,我认为要简单得多的 PL/SQL 解决方案。你可以用一个循环来做到这一点,但我认为解释两个循环会更清楚。如果您没有大量数据,那么这可能无关紧要。

我们遍历每个事件(外循环),然后是该事件每天的数据(内循环)。我确实注意到您在event_id2020 年 11 月 4 日为 5499345 放入了两行(不确定这是否是故意的),但是GROUP BY内部循环上的处理:使用 MAX() 会更喜欢 a'yes'而不是 a 'no''no'然后,我们只需计算推进范围结束日期的连续值。遇到一个'yes'值会重置计数器,点击 7 天会打印结果并重置。对于'no'给定的event_id.

DECLARE
  v_cal_date_ct INTEGER;
  v_start_date  comp_test.cal_date%TYPE;
  v_end_date    comp_test.cal_date%TYPE;
BEGIN
  FOR rec_event IN (SELECT DISTINCT ct.event_id
                      FROM comp_test ct
                     ORDER BY ct.event_id)
  LOOP
    v_cal_date_ct := 0;
  
    FOR rec_date IN (SELECT ct.cal_date,
                            MAX(ct.event_result) AS event_result
                       FROM comp_test ct
                      WHERE ct.event_id = rec_event.event_id
                      GROUP BY ct.cal_date
                      ORDER BY ct.cal_date)
    LOOP
      IF v_cal_date_ct = 0 THEN
        v_start_date := rec_date.cal_date;
        v_end_date   := rec_date.cal_date;
      END IF;
    
      IF rec_date.event_result = 'no' THEN
        v_cal_date_ct := v_cal_date_ct + 1;
        v_end_date    := rec_date.cal_date;
      ELSE
        v_cal_date_ct := 0;
      END IF;
    
      IF v_cal_date_ct >= 7 THEN
        dbms_output.put_line(rec_event.event_id || ' ' ||
                             to_char(v_start_date,
                                     'mm/dd/yyyy') || ' ' ||
                             to_char(v_end_date,
                                     'mm/dd/yyyy'));
        v_cal_date_ct := 0;
      END IF;
    END LOOP;
  
    IF v_cal_date_ct > 0 THEN
      dbms_output.put_line(rec_event.event_id || ' ' ||
                           to_char(v_start_date,
                                   'mm/dd/yyyy') || ' ' ||
                           to_char(v_end_date,
                                   'mm/dd/yyyy'));
    END IF;
  END LOOP;
END;
/

结果:

1772349 10/10/2020 10/16/2020
1772349 10/24/2020 10/24/2020
5499345 10/15/2020 10/21/2020
5499345 10/25/2020 10/31/2020
5499345 11/02/2020 11/08/2020
5499345 11/09/2020 11/15/2020

最后,记得SET serveroutput ON在 SQL*Plus 中执行一次以查看结果。如果您想为自己构建一个表以插入或将其转换为流水线函数,则相同的逻辑将起作用。


推荐阅读