plsql - 分析事件发生的 SQL 时间序列
问题描述
我正在尝试为 PL/SQL 中的特定时间序列分析提出一个解决方案。我将使用一个非常简单的数据集来解释这个场景。在 COMP_TEST 表中,只要在 EVENT_START_DATE 和 EVENT_END_DATE 之间存在单独的日历日 (CAL_DATE),事件就会获得为其创建的行。在 EVENT_ID 1772349 的情况下,我们会得到 10/01/2020 – 10/24/2020 的行,因为那是开始和结束范围。在每个给定的日历日,某些事件的结果在 EVENT_RESULT 列中记录为“是”或“否”。
检查要求:
- 每 7 个日历日应该有一个“是”EVENT_RESULT。
- 两次“是”结果之间的间隔不应超过 7 个日历日。
- 连续 7 个日历日的“否”结果将被计为一个不合规/缺陷单位。
- 每次遇到“是”结果时,评估接下来 7 个日历日的“否”结果的计数应重置回 1。
- 如果在 CAL_DATE 遇到等于 EVENT_END_DATE 的“否”结果,那么这也将计为一个不合规/缺陷单位
注意:此处的 7 个日历日应包括开始日期和结束日期。
在 EVENT_ID 1772349 的情况下,第一个 CAL_DATE 01-OCT-20 的结果为“是”,因此接下来的 7 个日历日评估将从 02-OCT-20(第 1 天)开始,到 08-OCT-20(第 7 天)结束) 其中记录了“是”结果。第二次评估将从 09-OCT-20 开始,这恰好也是“是”结果,因此下一次计数实际上将从 10-OCT-20 开始,结果为“否”并继续为“否”直到16-OCT-20 连续 7 天。然后第三次评估将从 17-OCT-20 开始,之后在 18-OCT-20 只有一个额外的“否”结果,然后从 19-OCT-20 到 23-OCT-20 的所有日子都是“是”结果。最后,在 20 年 10 月 24 日活动的最后一个日历日,我们击中了一个“否”,这将被归类为缺陷。
我在下面包含了创建表脚本、我尝试过的查询以及预期的输出。由于这听起来很适合差距和孤岛问题,所以我从这种方法开始,但不知道如何解决其余的要求。
表创建脚本:
CREATE TABLE COMP_TEST (
event_id INT
,cal_date DATE
,event_result varchar2(5)
,event_start_date DATE
,event_end_date DATE
)
表格插入:
INSERT ALL
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (1772349,TO_DATE('10/01/2020','MM/DD/YYYY'),'yes',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY'))
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (1772349,TO_DATE('10/02/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY'))
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (1772349,TO_DATE('10/03/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY'))
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (1772349,TO_DATE('10/04/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY'))
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (1772349,TO_DATE('10/05/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY'))
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (1772349,TO_DATE('10/06/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY'))
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (1772349,TO_DATE('10/07/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY'))
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (1772349,TO_DATE('10/08/2020','MM/DD/YYYY'),'yes',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY'))
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (1772349,TO_DATE('10/09/2020','MM/DD/YYYY'),'yes',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY'))
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (1772349,TO_DATE('10/10/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY'))
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (1772349,TO_DATE('10/11/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY'))
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (1772349,TO_DATE('10/12/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY'))
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (1772349,TO_DATE('10/13/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY'))
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (1772349,TO_DATE('10/14/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY'))
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (1772349,TO_DATE('10/15/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY'))
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (1772349,TO_DATE('10/16/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY'))
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (1772349,TO_DATE('10/17/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY'))
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (1772349,TO_DATE('10/18/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY'))
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (1772349,TO_DATE('10/19/2020','MM/DD/YYYY'),'yes',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY'))
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (1772349,TO_DATE('10/20/2020','MM/DD/YYYY'),'yes',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY'))
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (1772349,TO_DATE('10/21/2020','MM/DD/YYYY'),'yes',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY'))
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (1772349,TO_DATE('10/22/2020','MM/DD/YYYY'),'yes',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY'))
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (1772349,TO_DATE('10/23/2020','MM/DD/YYYY'),'yes',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY'))
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (1772349,TO_DATE('10/24/2020','MM/DD/YYYY'),'no',TO_DATE('10/01/2020','MM/DD/YYYY'),TO_DATE('10/24/2020','MM/DD/YYYY'))
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('10/15/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('10/16/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('10/17/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('10/18/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('10/19/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('10/20/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('10/21/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('10/22/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('10/23/2020','MM/DD/YYYY'),'yes',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('10/24/2020','MM/DD/YYYY'),'yes',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('10/25/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('10/26/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('10/27/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('10/28/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('10/29/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('10/30/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('10/31/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('11/01/2020','MM/DD/YYYY'),'yes',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('11/02/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('11/03/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('11/04/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('11/04/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('11/05/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('11/06/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('11/07/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('11/08/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('11/09/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('11/10/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('11/11/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('11/12/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('11/13/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('11/14/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
INTO COMP_TEST (event_id, cal_date, event_result, event_start_date, event_end_date) VALUES (5499345,TO_DATE('11/15/2020','MM/DD/YYYY'),'no',TO_DATE('10/15/2020','MM/DD/YYYY'),NULL)
SELECT * FROM DUAL
我试过的查询:
WITH DAILY_RESULT AS (
SELECT
event_id
, cal_date
, CASE
WHEN event_result = 'yes' THEN 'compliant'
WHEN event_result = 'no' THEN 'deficient'
END AS status
FROM COMP_TEST
),
RANKINGS AS (
SELECT
event_id
, cal_date
, status AS status
, DENSE_RANK() OVER (PARTITION BY event_id ORDER BY cal_date) -
DENSE_RANK() OVER (PARTITION BY event_id, status ORDER BY cal_date)
AS sequence_grouping
FROM DAILY_RESULT
ORDER BY cal_date ASC
)
SELECT
event_id
, MIN(cal_date) AS start_date
, MAX(cal_date) AS end_date
, MAX(cal_date) - MIN(cal_date) AS duration
FROM RANKINGS
WHERE status = 'deficient'
GROUP BY
event_id
, sequence_grouping
HAVING MAX(cal_date) - MIN(cal_date) >=7
期望的输出:
EVENT_ID - START_DATE - END_DATE
1772349 -- 10/10/2020 -- 10/16/2020
1772349 -- 10/24/2020 -- 10/24/2020
5499345 -- 10/15/2020 -- 10/21/2020
5499345 -- 10/25/2020 -- 10/31/2020
5499345 -- 11/02/2020 -- 11/08/2020
5499345 -- 11/09/2020 -- 11/15/2020
解决方案
我认为这是你想要的,但我无法弄清楚你在deficient_cnt
. 此外,您的问题可能会使用一些格式,因为它真的很难阅读和理解要求。感谢您提供 aCREATE TABLE
和 DML 语句。
好的,我认为要简单得多的 PL/SQL 解决方案。你可以用一个循环来做到这一点,但我认为解释两个循环会更清楚。如果您没有大量数据,那么这可能无关紧要。
我们遍历每个事件(外循环),然后是该事件每天的数据(内循环)。我确实注意到您在event_id
2020 年 11 月 4 日为 5499345 放入了两行(不确定这是否是故意的),但是GROUP BY
内部循环上的处理:使用 MAX() 会更喜欢 a'yes'
而不是 a 'no'
。'no'
然后,我们只需计算推进范围结束日期的连续值。遇到一个'yes'
值会重置计数器,点击 7 天会打印结果并重置。对于'no'
给定的event_id
.
DECLARE
v_cal_date_ct INTEGER;
v_start_date comp_test.cal_date%TYPE;
v_end_date comp_test.cal_date%TYPE;
BEGIN
FOR rec_event IN (SELECT DISTINCT ct.event_id
FROM comp_test ct
ORDER BY ct.event_id)
LOOP
v_cal_date_ct := 0;
FOR rec_date IN (SELECT ct.cal_date,
MAX(ct.event_result) AS event_result
FROM comp_test ct
WHERE ct.event_id = rec_event.event_id
GROUP BY ct.cal_date
ORDER BY ct.cal_date)
LOOP
IF v_cal_date_ct = 0 THEN
v_start_date := rec_date.cal_date;
v_end_date := rec_date.cal_date;
END IF;
IF rec_date.event_result = 'no' THEN
v_cal_date_ct := v_cal_date_ct + 1;
v_end_date := rec_date.cal_date;
ELSE
v_cal_date_ct := 0;
END IF;
IF v_cal_date_ct >= 7 THEN
dbms_output.put_line(rec_event.event_id || ' ' ||
to_char(v_start_date,
'mm/dd/yyyy') || ' ' ||
to_char(v_end_date,
'mm/dd/yyyy'));
v_cal_date_ct := 0;
END IF;
END LOOP;
IF v_cal_date_ct > 0 THEN
dbms_output.put_line(rec_event.event_id || ' ' ||
to_char(v_start_date,
'mm/dd/yyyy') || ' ' ||
to_char(v_end_date,
'mm/dd/yyyy'));
END IF;
END LOOP;
END;
/
结果:
1772349 10/10/2020 10/16/2020
1772349 10/24/2020 10/24/2020
5499345 10/15/2020 10/21/2020
5499345 10/25/2020 10/31/2020
5499345 11/02/2020 11/08/2020
5499345 11/09/2020 11/15/2020
最后,记得SET serveroutput ON
在 SQL*Plus 中执行一次以查看结果。如果您想为自己构建一个表以插入或将其转换为流水线函数,则相同的逻辑将起作用。
推荐阅读
- mysql - mysql查询需要双重别名才能运行
- python - 如何配置 Python 以使用与我的计算机相同的代理设置
- javascript - 如何从快乐数算法中获得回报?
- scala - Spark scala - 计算动态时间戳间隔
- python - 命令在 shell 中有效,但在子进程中无效
- node.js - Vue UI 的内存限制如何提高
- r - 按 varname 汇总分组
- ruby-on-rails - 触发回调 after_commit 但不是由于触摸
- spring-boot - 使用 HTTPUrlConnection 进行 Mockito 单元测试
- asp.net-mvc-5 - MVC 5 Ajax.BeginForm 提交按钮调用当前页面而不是指定的控制器和操作的 URL