sql - DB2 SQL:考虑优先级和删除指示符的 LISTAGG
问题描述
我有一个如下所示的数据集(数据集按时间戳排序)。在下面的 EMP_ID 和 DEPT_ID 组合数据集中,可以分配不同的课程。我们必须找出在 EMP_ID 和 DEPT_ID 组合的日期范围内有效的所有课程。
我们考虑了 F_PRIORITY(第一优先级)和 S_PRIORITY(第二优先级)属性:
- “严重 > 高 > 中 > 低”对第一优先级有效。
- 如果对于同一日期(YYYYMMDD)的两条记录,如果 F_PRIORITY 值相同,则需要考虑 S_PRIORITY。S_PRIORITY 总是有整数,最高值获得优先级。
- 如果对于任何日期 (YYYYMMDD),我们发现 F_PRIORITY 和 S_PRIORITY 相同,则需要将多个课程与逗号分隔符组合在一起。
- 在任何时候,如果发现任何课程被删除 (DELETED=1) 并且如果该课程被认为是基于优先级值,则需要从下一个有效开始日期删除。
- 如果在任何日期找到具有高优先级的 EMP_ID 和 DEPT_ID 组合的课程,则同一课程应在所有下一个可用日期中流动。如果在 EMP_ID 和 DEPT_ID 组合的下一个可用日期找到相同的优先级课程,则这两个课程都应以逗号分隔。如果课程的优先级在下一个可用日期更高,则它将取代前一个日期的课程。
我想在 DB2 数据库中计算所有这些。如果简单的查询无济于事,请告诉我是否有任何功能可以提供帮助。
源数据集:
D_DATE EMP_ID DEPT_ID COURSE_ID F_PRIORITY S_PRIORITY DELETED D_TIMESTAMP
20190419 E1 D1 C1 Low 2 0 2019-04-19 12:10:18
20190521 E1 D1 C2 Medium 2 0 2019-05-21 12:10:18
20190521 E1 D1 C3 High 4 0 2019-05-21 14:10:18
20190521 E1 D1 C4 High 4 0 2019-05-21 14:11:18
20190621 E1 D1 C3 High 4 1 2019-06-21 15:10:18
20190621 E1 D1 C5 High 4 0 2019-06-21 16:10:18
20191021 E1 D1 C6 High 8 0 2019-06-21 16:10:18
20200121 E1 D1 C7 Critical 4 0 2019-05-21 14:10:18
20200121 E1 D1 C8 Critical 4 0 2019-05-21 14:11:18
20190419 E2 Null C1 Low 2 0 2019-04-19 12:10:18
20190521 E2 Null C2 Medium 2 0 2019-05-21 12:10:18
20190521 E2 Null C3 High 4 0 2019-05-21 14:10:18
20190521 E2 Null C4 High 4 0 2019-05-21 14:11:18
20190531 E2 Null C2 Medium 2 1 2019-05-31 15:01:18
20190621 E2 D2 C3 High 4 0 2019-06-21 15:10:18
20190621 E2 D2 C5 High 6 0 2019-06-21 16:10:18
20190721 E2 D2 C5 High 6 1 2019-06-21 15:10:18
20191021 E2 D2 C7 Low 8 0 2019-06-21 16:10:18
预期输出:
EMP_ID DEPT_ID COURSE_DTL S_DATE E_DATE
E1 D1 C1 20190419 20190520
E1 D1 C3,C4 20190521 20190620 (C2 not considered as Medium)
E1 D1 C4,C5 20190621 20191020 (C3 Deleted, so removed)
E1 D1 C6 20191021 20200120 (C6 takes priority as S_PRIORITY=8)
E1 D1 C7,C8 20200121 99991231 (C7, C8 F_PRIORITY=Critical; so takes priority)
E2 Null C1 20190419 20190520
E2 Null C3,C4 20190521 99991231 (addition of C2 does not impact as F_PRIORITY is less than C3 and C4. Also DELETED of C2 does not impact as C2 was not there in the consideration from Date=20190521)
E2 D2 C5 20190621 20190720
E2 D2 Null 20190721 20191020
E2 D2 C7 20191021 99991231
解决方案
你没有给我一个关于更高优先级压倒低优先级有效性的答案,所以我做了两个例子。在第一个示例中,较高优先级的事件“删除”较低优先级的事件,与您的预期输出一致。
WITH source (d_date, emp_id, dept_id, course_id, f_priority, s_priority, deleted, d_timestamp) AS ( VALUES ('20190419', 'E1', 'D1', 'C1', 'Low', 2, 0, '2019-04-19 12:10:18'), ('20190521', 'E1', 'D1', 'C2', 'Medium', 2, 0, '2019-05-21 12:10:18'), ('20190521', 'E1', 'D1', 'C3', 'High', 4, 0, '2019-05-21 14:10:18'), ('20190521', 'E1', 'D1', 'C4', 'High', 4, 0, '2019-05-21 14:11:18'), ('20190621', 'E1', 'D1', 'C3', 'High', 4, 1, '2019-06-21 15:10:18'), ('20190621', 'E1', 'D1', 'C5', 'High', 4, 0, '2019-06-21 16:10:18'), ('20191021', 'E1', 'D1', 'C6', 'High', 8, 0, '2019-06-21 16:10:18'), ('20200121', 'E1', 'D1', 'C7', 'Critical', 4, 0, '2019-05-21 14:10:18'), ('20200121', 'E1', 'D1', 'C8', 'Critical', 4, 0, '2019-05-21 14:11:18'), ('20190419', 'E2', NULL, 'C1', 'Low', 2, 0, '2019-04-19 12:10:18'), ('20190521', 'E2', NULL, 'C2', 'Medium', 2, 0, '2019-05-21 12:10:18'), ('20190521', 'E2', NULL, 'C3', 'High', 4, 0, '2019-05-21 14:10:18'), ('20190521', 'E2', NULL, 'C4', 'High', 4, 0, '2019-05-21 14:11:18'), ('20190531', 'E2', NULL, 'C2', 'Medium', 2, 1, '2019-05-31 15:01:18'), ('20190621', 'E2', 'D2', 'C3', 'High', 4, 0, '2019-06-21 15:10:18'), ('20190621', 'E2', 'D2', 'C5', 'High', 6, 0, '2019-06-21 16:10:18'), ('20190721', 'E2', 'D2', 'C5', 'High', 6, 1, '2019-06-21 15:10:18'), ('20191021', 'E2', 'D2', 'C7', 'Low', 8, 0, '2019-06-21 16:10:18') ), max_priorities (emp_id, dept_id, d_date, course_id, f_priority, s_priority, deleted, fs_ord_priority, max_fs_ord_priority) AS ( SELECT emp_id, dept_id, d_date, course_id, s.f_priority, s_priority, deleted, def.f_ord_priority*10+s_priority, MAX(def.f_ord_priority*10+s_priority) OVER (PARTITION BY emp_id, dept_id, d_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) FROM source s INNER JOIN (VALUES ('Low', 1), ('Medium', 2), ('High', 3), ('Critical', 4)) AS def(f_priority, f_ord_priority) ON s.f_priority = def.f_priority ), filtered_priorities (emp_id, dept_id, d_date, course_id, deleted, fs_ord_priority, entry_ord, entry_count, date_ord) AS ( SELECT emp_id, dept_id, d_date, course_id, deleted, fs_ord_priority, ROWNUMBER() OVER (PARTITION BY emp_id, dept_id ORDER BY d_date, course_id), COUNT(*) OVER (PARTITION BY emp_id, dept_id), COUNT(*) OVER (PARTITION BY emp_id, dept_id ORDER BY d_date) FROM max_priorities WHERE max_fs_ord_priority = fs_ord_priority ) , recursive_courses (emp_id, dept_id, d_date, course_dtl, current_fs_ord_priority, entry_ord, entry_count, date_ord) AS ( SELECT emp_id, dept_id, d_date, CAST(',' || course_id AS VARCHAR(100)), fs_ord_priority, entry_ord, entry_count, date_ord FROM filtered_priorities WHERE entry_ord = 1 UNION ALL SELECT n.emp_id, n.dept_id, n.d_date, CASE WHEN o.current_fs_ord_priority < n.fs_ord_priority THEN ',' || n.course_id WHEN o.current_fs_ord_priority > n.fs_ord_priority THEN o.course_dtl WHEN o.current_fs_ord_priority = n.fs_ord_priority AND n.deleted = 0 THEN o.course_dtl || ',' || n.course_id ELSE INSERT(o.course_dtl, LOCATE(','||n.course_id||',', o.course_dtl||','), LENGTH(','||n.course_id), '') END, CASE WHEN o.current_fs_ord_priority < n.fs_ord_priority THEN n.fs_ord_priority WHEN o.current_fs_ord_priority = n.fs_ord_priority AND n.deleted = 1 AND LOCATE(',', o.course_dtl, 2) = 0 THEN -1 ELSE o.current_fs_ord_priority END, CASE WHEN o.current_fs_ord_priority > n.fs_ord_priority THEN -1 ELSE n.entry_ord END, n.entry_count, n.date_ord FROM recursive_courses o, filtered_priorities n WHERE (o.emp_id, COALESCE(o.dept_id, 'N/A'), o.entry_ord + 1) = (n.emp_id, COALESCE(n.dept_id, 'N/A'), n.entry_ord) AND n.entry_ord <= o.entry_count ) SELECT emp_id, dept_id, CASE WHEN course_dtl != '' THEN SUBSTR(course_dtl, 2) END AS course_dtl, d_date AS s_date, COALESCE( INSERT(INSERT(CHAR( DATE(INSERT(INSERT(LEAD(d_date) OVER (PARTITION BY emp_id, dept_id ORDER BY d_date), 5, 0, '-'), 8, 0, '-')) - 1 DAY ), 5, 1, ''), 7, 1, '') , '99991231') AS e_date FROM recursive_courses WHERE entry_ord = date_ord ORDER BY emp_id, COALESCE(dept_id, ''), d_date ;
EMP_ID | 部门 ID | 课程_DTL | S_DATE | E_DATE -----: | ------: | ---------: | --------: | --------: E1 | D1 | C1 | 20190419 | 20190520 E1 | D1 | C3,C4 | 20190521 | 20190620 E1 | D1 | C4,C5 | 20190621 | 20191020 E1 | D1 | C6 | 20191021 | 20200120 E1 | D1 | C7,C8 | 20200121 | 99991231 E2 | 空| C1 | 20190419 | 20190520 E2 | 空| C3,C4 | 20190521 | 99991231 E2 | D2 | C5 | 20190621 | 20190720 E2 | D2 | 空| 20190721 | 20191020 E2 | D2 | C7 | 20191021 | 99991231
db<>在这里摆弄
但我必须做一个递归 CTE。对个别 CTE 的评论:
- max_priorities:以数值形式(f_priority 和 s_priority 组合)找到一天中的最大优先级(每个 emp_id 和 dept_id);不必考虑“删除”优先级较低的行,因为优先级较高,
- filters_priorities:每天只取最高优先级(当然还有emp_id和dept_id……),计算入场顺序,入场/日期顺序和计数(以便识别一天中的最后一个条目和每个emp_id的最后一个条目和dept_id),
- recursive_courses:根据优先级值,删除等顺序一次取一行。计算其课程(出于实际原因,始终使用前导逗号!;如果我们删除了最后一个活动课程,则优先级降至 - 1、
- last INSERTS 只是为了将下一个 d_date (YYYYMMDD) 转换为真正的 DATE,找到前一天并将其转换回 YYYYMMDD。
对于第二种选择,如果必须明确删除所有课程以标记结束,我选择了另一种方法:我试图每天找出哪些课程是有效的。我不会进一步评论它,因为我不确定你是否也对这个答案感兴趣。(它与您的“预期输出”不同。)
WITH source (d_date, emp_id, dept_id, course_id, f_priority, s_priority, deleted, d_timestamp) AS ( VALUES ('20190419', 'E1', 'D1', 'C1', 'Low', 2, 0, '2019-04-19 12:10:18'), ('20190521', 'E1', 'D1', 'C2', 'Medium', 2, 0, '2019-05-21 12:10:18'), ('20190521', 'E1', 'D1', 'C3', 'High', 4, 0, '2019-05-21 14:10:18'), ('20190521', 'E1', 'D1', 'C4', 'High', 4, 0, '2019-05-21 14:11:18'), ('20190621', 'E1', 'D1', 'C3', 'High', 4, 1, '2019-06-21 15:10:18'), ('20190621', 'E1', 'D1', 'C5', 'High', 4, 0, '2019-06-21 16:10:18'), ('20191021', 'E1', 'D1', 'C6', 'High', 8, 0, '2019-06-21 16:10:18'), ('20200121', 'E1', 'D1', 'C7', 'Critical', 4, 0, '2019-05-21 14:10:18'), ('20200121', 'E1', 'D1', 'C8', 'Critical', 4, 0, '2019-05-21 14:11:18'), ('20190419', 'E2', NULL, 'C1', 'Low', 2, 0, '2019-04-19 12:10:18'), ('20190521', 'E2', NULL, 'C2', 'Medium', 2, 0, '2019-05-21 12:10:18'), ('20190521', 'E2', NULL, 'C3', 'High', 4, 0, '2019-05-21 14:10:18'), ('20190521', 'E2', NULL, 'C4', 'High', 4, 0, '2019-05-21 14:11:18'), ('20190531', 'E2', NULL, 'C2', 'Medium', 2, 1, '2019-05-31 15:01:18'), ('20190621', 'E2', 'D2', 'C3', 'High', 4, 0, '2019-06-21 15:10:18'), ('20190621', 'E2', 'D2', 'C5', 'High', 6, 0, '2019-06-21 16:10:18'), ('20190721', 'E2', 'D2', 'C5', 'High', 6, 1, '2019-06-21 15:10:18'), ('20191021', 'E2', 'D2', 'C7', 'Low', 8, 0, '2019-06-21 16:10:18') ), all_dates (d_date) AS ( SELECT INSERT(INSERT(CHAR(min_dt+(a+b+c) DAYS),5,1,''),7,1,'') FROM ( SELECT DATE(INSERT(INSERT(MIN(d_date),5,0,'-'),8,0,'-')), DATE(INSERT(INSERT(MAX(d_date),5,0,'-'),8,0,'-')) FROM SOURCE ) AS def(min_dt, max_dt) CROSS JOIN (VALUES 0,1,2,3,4,5,6,7,8,9) AS aa(a) CROSS JOIN (VALUES 00,10,20,30,40,50,60,70,80,90) AS bb(b) CROSS JOIN (VALUES 000,100,200,300,400,500,600,700,800,900) AS cc(c) WHERE min_dt + (a+b+c) DAYS <= max_dt UNION ALL VALUES '99991231' ), all_dates_and_emp_dept_courses (emp_id, dept_id, course_id, d_date, f_priority, s_priority) AS ( SELECT def.emp_id, def.dept_id, def.course_id, d.d_date, NULLIF(LAST_VALUE(CASE WHEN s.deleted=1 THEN 'DELETED' ELSE s.f_priority END, 'IGNORE NULLS') OVER (PARTITION BY def.emp_id, def.dept_id, def.course_id ORDER BY d.d_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 'DELETED'), NULLIF(LAST_VALUE(CASE WHEN s.deleted=1 THEN -100 ELSE s.s_priority END, 'IGNORE NULLS') OVER (PARTITION BY def.emp_id, def.dept_id, def.course_id ORDER BY d.d_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), -100) FROM (SELECT emp_id, dept_id, course_id, MIN(MIN(d_date))OVER(PARTITION BY emp_id, dept_id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), MAX(MAX(d_date))OVER(PARTITION BY emp_id, dept_id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) FROM source GROUP BY emp_id, dept_id, course_id ) AS def (emp_id, dept_id, course_id, min_d_date, max_d_date) INNER JOIN all_dates d ON d.d_date BETWEEN def.min_d_date AND def.max_d_date OR d.d_date = '99991231' LEFT JOIN source s ON (def.emp_id, COALESCE(def.dept_id,'N/A'), def.course_id, d.d_date) = (s.emp_id, COALESCE(s.dept_id,'N/A'), s.course_id, s.d_date) ), all_dates_and_emp_dept_priorities (emp_id, dept_id, d_date, f_priority, s_priority, max_fs_priority, course_dtl, count_changes) AS ( SELECT emp_id, dept_id, d_date, f_priority, s_priority, LAST_VALUE(f_priority||'/'||CHAR(s_priority)) OVER (PARTITION BY emp_id, dept_id, d_date ORDER BY DECODE(f_priority, 'Low', 1, 'Medium', 2, 'High', 3, 'Critical', 4) NULLS FIRST, s_priority NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), LISTAGG(course_id, ','), SUM(CASE WHEN COALESCE(LAG(LISTAGG(course_id, ',')) OVER (PARTITION BY emp_id, dept_id, f_priority, s_priority ORDER BY d_date),'N/A')=COALESCE(LISTAGG(course_id, ','),'N/A') THEN 0 ELSE 1 END) OVER (PARTITION BY emp_id, dept_id, f_priority, s_priority ORDER BY d_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) FROM all_dates_and_emp_dept_courses GROUP BY emp_id, dept_id, d_date, f_priority, s_priority ) SELECT DISTINCT emp_id, dept_id, course_dtl, MIN(d_date) OVER (PARTITION BY emp_id, dept_id, course_dtl ORDER BY count_changes RANGE BETWEEN 0 PRECEDING AND 0 FOLLOWING), MAX(d_date) OVER (PARTITION BY emp_id, dept_id, course_dtl ORDER BY count_changes RANGE BETWEEN 0 PRECEDING AND 0 FOLLOWING) FROM all_dates_and_emp_dept_priorities WHERE COALESCE(f_priority||'/'||CHAR(s_priority),'N/A')=COALESCE(max_fs_priority,'N/A') ORDER BY emp_id, dept_id, 4 ;
EMP_ID | 部门 ID | 课程_DTL | 4 | 5 -----: | ------: | ---------: | --------: | --------: E1 | D1 | C1 | 20190419 | 20190520 E1 | D1 | C3,C4 | 20190521 | 20190620 E1 | D1 | C4,C5 | 20190621 | 20191020 E1 | D1 | C6 | 20191021 | 20200120 E1 | D1 | C7,C8 | 20200121 | 99991231 E2 | D2 | C5 | 20190621 | 20190720 E2 | D2 | C3 | 20190721 | 99991231 E2 | 空| C1 | 20190419 | 20190520 E2 | 空| C3,C4 | 20190521 | 99991231
db<>在这里摆弄
推荐阅读
- regex - 正则表达式 - 引号内的引号
- swift - 如何开发现有 Xcode 项目的自定义 cocoa-pod?
- angular - Angular 不会使用浏览器后退/前进按钮呈现 facebook 帖子
- apache-spark - spark As 可以是列子查询
- sql - 将日期转换放在where子句db2 sql中
- javascript - ReactJS Axios 它不允许我将数组分配为属性
- .net - 使用 .Net 延迟签名和 Digicert 证书
- java - Java - 检查单元测试的字符串编码?
- mysql - 四舍五入 mysql 0.5 并不总是上升
- c++ - 当我尝试提取第一个元素时,链接列表不起作用