首页 > 解决方案 > DB2 SQL:考虑优先级和删除指示符的 LISTAGG

问题描述

我有一个如下所示的数据集(数据集按时间戳排序)。在下面的 EMP_ID 和 DEPT_ID 组合数据集中,可以分配不同的课程。我们必须找出在 EMP_ID 和 DEPT_ID 组合的日期范围内有效的所有课程。

我们考虑了 F_PRIORITY(第一优先级)和 S_PRIORITY(第二优先级)属性:

我想在 DB2 数据库中计算所有这些。如果简单的查询无济于事,请告诉我是否有任何功能可以提供帮助。

源数据集:

D_DATE     EMP_ID  DEPT_ID  COURSE_ID F_PRIORITY  S_PRIORITY DELETED  D_TIMESTAMP
20190419   E1      D1       C1        Low         2          0        2019-04-19 12:10:18
20190521   E1      D1       C2        Medium      2          0        2019-05-21 12:10:18
20190521   E1      D1       C3        High        4          0        2019-05-21 14:10:18
20190521   E1      D1       C4        High        4          0        2019-05-21 14:11:18
20190621   E1      D1       C3        High        4          1        2019-06-21 15:10:18
20190621   E1      D1       C5        High        4          0        2019-06-21 16:10:18
20191021   E1      D1       C6        High        8          0        2019-06-21 16:10:18
20200121   E1      D1       C7        Critical    4          0        2019-05-21 14:10:18
20200121   E1      D1       C8        Critical    4          0        2019-05-21 14:11:18

20190419   E2      Null     C1        Low         2          0        2019-04-19 12:10:18
20190521   E2      Null     C2        Medium      2          0        2019-05-21 12:10:18
20190521   E2      Null     C3        High        4          0        2019-05-21 14:10:18
20190521   E2      Null     C4        High        4          0        2019-05-21 14:11:18
20190531   E2      Null     C2        Medium      2          1        2019-05-31 15:01:18

20190621   E2      D2       C3        High        4          0        2019-06-21 15:10:18
20190621   E2      D2       C5        High        6          0        2019-06-21 16:10:18
20190721   E2      D2       C5        High        6          1        2019-06-21 15:10:18
20191021   E2      D2       C7        Low         8          0        2019-06-21 16:10:18

预期输出:

EMP_ID  DEPT_ID  COURSE_DTL  S_DATE      E_DATE
E1      D1       C1          20190419    20190520
E1      D1       C3,C4       20190521    20190620 (C2 not considered as Medium)
E1      D1       C4,C5       20190621    20191020 (C3 Deleted, so removed)
E1      D1       C6          20191021    20200120 (C6 takes priority as S_PRIORITY=8)
E1      D1       C7,C8       20200121    99991231 (C7, C8 F_PRIORITY=Critical; so takes priority)

E2      Null     C1          20190419    20190520
E2      Null     C3,C4       20190521    99991231 (addition of C2 does not impact as F_PRIORITY is less than C3 and C4. Also DELETED of C2 does not impact as C2 was not there in the consideration from Date=20190521)

E2      D2       C5          20190621    20190720
E2      D2       Null        20190721    20191020
E2      D2       C7          20191021    99991231

标签: sqldb2

解决方案


你没有给我一个关于更高优先级压倒低优先级有效性的答案,所以我做了两个例子。在第一个示例中,较高优先级的事件“删除”较低优先级的事件,与您的预期输出一致。

WITH source (d_date, emp_id, dept_id, course_id, f_priority, s_priority, deleted, d_timestamp) AS
( VALUES ('20190419', 'E1', 'D1', 'C1', 'Low',      2, 0, '2019-04-19 12:10:18'),
         ('20190521', 'E1', 'D1', 'C2', 'Medium',   2, 0, '2019-05-21 12:10:18'),
         ('20190521', 'E1', 'D1', 'C3', 'High',     4, 0, '2019-05-21 14:10:18'),
         ('20190521', 'E1', 'D1', 'C4', 'High',     4, 0, '2019-05-21 14:11:18'),
         ('20190621', 'E1', 'D1', 'C3', 'High',     4, 1, '2019-06-21 15:10:18'),
         ('20190621', 'E1', 'D1', 'C5', 'High',     4, 0, '2019-06-21 16:10:18'),
         ('20191021', 'E1', 'D1', 'C6', 'High',     8, 0, '2019-06-21 16:10:18'),
         ('20200121', 'E1', 'D1', 'C7', 'Critical', 4, 0, '2019-05-21 14:10:18'),
         ('20200121', 'E1', 'D1', 'C8', 'Critical', 4, 0, '2019-05-21 14:11:18'),
         ('20190419', 'E2', NULL, 'C1', 'Low',      2, 0, '2019-04-19 12:10:18'),
         ('20190521', 'E2', NULL, 'C2', 'Medium',   2, 0, '2019-05-21 12:10:18'),
         ('20190521', 'E2', NULL, 'C3', 'High',     4, 0, '2019-05-21 14:10:18'),
         ('20190521', 'E2', NULL, 'C4', 'High',     4, 0, '2019-05-21 14:11:18'),
         ('20190531', 'E2', NULL, 'C2', 'Medium',   2, 1, '2019-05-31 15:01:18'),
         ('20190621', 'E2', 'D2', 'C3', 'High',     4, 0, '2019-06-21 15:10:18'),
         ('20190621', 'E2', 'D2', 'C5', 'High',     6, 0, '2019-06-21 16:10:18'),
         ('20190721', 'E2', 'D2', 'C5', 'High',     6, 1, '2019-06-21 15:10:18'),
         ('20191021', 'E2', 'D2', 'C7', 'Low',      8, 0, '2019-06-21 16:10:18')
), max_priorities (emp_id, dept_id, d_date, course_id, f_priority, s_priority, deleted, fs_ord_priority, max_fs_ord_priority) AS
( SELECT emp_id, dept_id, d_date, course_id, s.f_priority, s_priority, deleted, def.f_ord_priority*10+s_priority,
         MAX(def.f_ord_priority*10+s_priority) OVER (PARTITION BY emp_id, dept_id, d_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
    FROM source s
      INNER JOIN (VALUES ('Low', 1), ('Medium', 2), ('High', 3), ('Critical', 4)) AS def(f_priority, f_ord_priority) ON s.f_priority = def.f_priority
), filtered_priorities (emp_id, dept_id, d_date, course_id, deleted, fs_ord_priority, entry_ord, entry_count, date_ord) AS
( SELECT emp_id, dept_id, d_date, course_id, deleted, fs_ord_priority,
         ROWNUMBER() OVER (PARTITION BY emp_id, dept_id ORDER BY d_date, course_id),
         COUNT(*) OVER (PARTITION BY emp_id, dept_id),
         COUNT(*) OVER (PARTITION BY emp_id, dept_id ORDER BY d_date)
    FROM max_priorities
    WHERE max_fs_ord_priority = fs_ord_priority
)
, recursive_courses (emp_id, dept_id, d_date, course_dtl, current_fs_ord_priority, entry_ord, entry_count, date_ord) AS
( SELECT emp_id, dept_id, d_date, CAST(',' || course_id AS VARCHAR(100)), fs_ord_priority, entry_ord, entry_count, date_ord
    FROM filtered_priorities
    WHERE entry_ord = 1
  UNION ALL
  SELECT n.emp_id, n.dept_id, n.d_date,
         CASE WHEN o.current_fs_ord_priority < n.fs_ord_priority THEN ',' || n.course_id
              WHEN o.current_fs_ord_priority > n.fs_ord_priority THEN o.course_dtl
              WHEN o.current_fs_ord_priority = n.fs_ord_priority AND n.deleted = 0 THEN o.course_dtl || ',' || n.course_id
              ELSE INSERT(o.course_dtl, LOCATE(','||n.course_id||',', o.course_dtl||','), LENGTH(','||n.course_id), '')
         END,
         CASE WHEN o.current_fs_ord_priority < n.fs_ord_priority THEN n.fs_ord_priority
              WHEN o.current_fs_ord_priority = n.fs_ord_priority AND n.deleted = 1 AND LOCATE(',', o.course_dtl, 2) = 0 THEN -1
              ELSE o.current_fs_ord_priority
         END,
         CASE WHEN o.current_fs_ord_priority > n.fs_ord_priority THEN -1 ELSE n.entry_ord END,
         n.entry_count, n.date_ord
    FROM recursive_courses o, filtered_priorities n
    WHERE (o.emp_id, COALESCE(o.dept_id, 'N/A'), o.entry_ord + 1) = (n.emp_id, COALESCE(n.dept_id, 'N/A'), n.entry_ord)
      AND n.entry_ord <= o.entry_count
)
SELECT emp_id, dept_id, CASE WHEN course_dtl != '' THEN SUBSTR(course_dtl, 2) END AS course_dtl,
       d_date AS s_date,
       COALESCE( INSERT(INSERT(CHAR(
         DATE(INSERT(INSERT(LEAD(d_date) OVER (PARTITION BY emp_id, dept_id ORDER BY d_date), 5, 0, '-'), 8, 0, '-')) - 1 DAY
                 ), 5, 1, ''), 7, 1, '')
                , '99991231') AS e_date
  FROM recursive_courses
  WHERE entry_ord = date_ord
  ORDER BY emp_id, COALESCE(dept_id, ''), d_date
;
EMP_ID | 部门 ID | 课程_DTL | S_DATE | E_DATE
-----: | ------: | ---------: | --------: | --------:
    E1 | D1 | C1 | 20190419 | 20190520
    E1 | D1 | C3,C4 | 20190521 | 20190620
    E1 | D1 | C4,C5 | 20190621 | 20191020
    E1 | D1 | C6 | 20191021 | 20200120
    E1 | D1 | C7,C8 | 20200121 | 99991231
    E2 |    | C1 | 20190419 | 20190520
    E2 |    | C3,C4 | 20190521 | 99991231
    E2 | D2 | C5 | 20190621 | 20190720
    E2 | D2 |       | 20190721 | 20191020
    E2 | D2 | C7 | 20191021 | 99991231

db<>在这里摆弄

但我必须做一个递归 CTE。对个别 CTE 的评论:

  • max_priorities:以数值形式(f_priority 和 s_priority 组合)找到一天中的最大优先级(每个 emp_id 和 dept_id);不必考虑“删除”优先级较低的行,因为优先级较高,
  • filters_priorities:每天只取最高优先级(当然还有emp_id和dept_id……),计算入场顺序,入场/日期顺序和计数(以便识别一天中的最后一个条目和每个emp_id的最后一个条目和dept_id),
  • recursive_courses:根据优先级值,删除等顺序一次取一行。计算其课程(出于实际原因,始终使用前导逗号!;如果我们删除了最后一个活动课程,则优先级降至 - 1、
  • last INSERTS 只是为了将下一个 d_date (YYYYMMDD) 转换为真正的 DATE,找到前一天并将其转换回 YYYYMMDD。

对于第二种选择,如果必须明确删除所有课程以标记结束,我选择了另一种方法:我试图每天找出哪些课程是有效的。我不会进一步评论它,因为我不确定你是否也对这个答案感兴趣。(它与您的“预期输出”不同。)

WITH source (d_date, emp_id, dept_id, course_id, f_priority, s_priority, deleted, d_timestamp) AS
( VALUES ('20190419', 'E1', 'D1', 'C1', 'Low',      2, 0, '2019-04-19 12:10:18'),
         ('20190521', 'E1', 'D1', 'C2', 'Medium',   2, 0, '2019-05-21 12:10:18'),
         ('20190521', 'E1', 'D1', 'C3', 'High',     4, 0, '2019-05-21 14:10:18'),
         ('20190521', 'E1', 'D1', 'C4', 'High',     4, 0, '2019-05-21 14:11:18'),
         ('20190621', 'E1', 'D1', 'C3', 'High',     4, 1, '2019-06-21 15:10:18'),
         ('20190621', 'E1', 'D1', 'C5', 'High',     4, 0, '2019-06-21 16:10:18'),
         ('20191021', 'E1', 'D1', 'C6', 'High',     8, 0, '2019-06-21 16:10:18'),
         ('20200121', 'E1', 'D1', 'C7', 'Critical', 4, 0, '2019-05-21 14:10:18'),
         ('20200121', 'E1', 'D1', 'C8', 'Critical', 4, 0, '2019-05-21 14:11:18'),
         ('20190419', 'E2', NULL, 'C1', 'Low',      2, 0, '2019-04-19 12:10:18'),
         ('20190521', 'E2', NULL, 'C2', 'Medium',   2, 0, '2019-05-21 12:10:18'),
         ('20190521', 'E2', NULL, 'C3', 'High',     4, 0, '2019-05-21 14:10:18'),
         ('20190521', 'E2', NULL, 'C4', 'High',     4, 0, '2019-05-21 14:11:18'),
         ('20190531', 'E2', NULL, 'C2', 'Medium',   2, 1, '2019-05-31 15:01:18'),
         ('20190621', 'E2', 'D2', 'C3', 'High',     4, 0, '2019-06-21 15:10:18'),
         ('20190621', 'E2', 'D2', 'C5', 'High',     6, 0, '2019-06-21 16:10:18'),
         ('20190721', 'E2', 'D2', 'C5', 'High',     6, 1, '2019-06-21 15:10:18'),
         ('20191021', 'E2', 'D2', 'C7', 'Low',      8, 0, '2019-06-21 16:10:18')
), all_dates (d_date) AS
( SELECT INSERT(INSERT(CHAR(min_dt+(a+b+c)  DAYS),5,1,''),7,1,'')
    FROM ( SELECT DATE(INSERT(INSERT(MIN(d_date),5,0,'-'),8,0,'-')), DATE(INSERT(INSERT(MAX(d_date),5,0,'-'),8,0,'-')) FROM SOURCE ) AS def(min_dt, max_dt)
      CROSS JOIN (VALUES 0,1,2,3,4,5,6,7,8,9) AS aa(a)
      CROSS JOIN (VALUES 00,10,20,30,40,50,60,70,80,90) AS bb(b)
      CROSS JOIN (VALUES 000,100,200,300,400,500,600,700,800,900) AS cc(c)
    WHERE min_dt + (a+b+c) DAYS <= max_dt
  UNION ALL
  VALUES '99991231'
), all_dates_and_emp_dept_courses (emp_id, dept_id, course_id, d_date, f_priority, s_priority) AS
( SELECT def.emp_id, def.dept_id, def.course_id, d.d_date,
         NULLIF(LAST_VALUE(CASE WHEN s.deleted=1 THEN 'DELETED' ELSE s.f_priority END, 'IGNORE NULLS') OVER (PARTITION BY def.emp_id, def.dept_id, def.course_id ORDER BY d.d_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 'DELETED'),
         NULLIF(LAST_VALUE(CASE WHEN s.deleted=1 THEN -100 ELSE s.s_priority END, 'IGNORE NULLS') OVER (PARTITION BY def.emp_id, def.dept_id, def.course_id ORDER BY d.d_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), -100)
    FROM (SELECT emp_id, dept_id, course_id,
                 MIN(MIN(d_date))OVER(PARTITION BY emp_id, dept_id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING),
                 MAX(MAX(d_date))OVER(PARTITION BY emp_id, dept_id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
            FROM source
            GROUP BY emp_id, dept_id, course_id
         ) AS def (emp_id, dept_id, course_id, min_d_date, max_d_date)
      INNER JOIN all_dates d ON d.d_date BETWEEN def.min_d_date AND def.max_d_date OR d.d_date = '99991231'
      LEFT JOIN source s ON (def.emp_id, COALESCE(def.dept_id,'N/A'), def.course_id, d.d_date) = (s.emp_id, COALESCE(s.dept_id,'N/A'), s.course_id, s.d_date)
), all_dates_and_emp_dept_priorities (emp_id, dept_id, d_date, f_priority, s_priority, max_fs_priority, course_dtl, count_changes) AS
( SELECT emp_id, dept_id, d_date,
         f_priority, s_priority, LAST_VALUE(f_priority||'/'||CHAR(s_priority)) OVER (PARTITION BY emp_id, dept_id, d_date ORDER BY DECODE(f_priority, 'Low', 1, 'Medium', 2, 'High', 3, 'Critical', 4) NULLS FIRST, s_priority NULLS FIRST ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING),
         LISTAGG(course_id, ','),
         SUM(CASE WHEN COALESCE(LAG(LISTAGG(course_id, ',')) OVER (PARTITION BY emp_id, dept_id, f_priority, s_priority ORDER BY d_date),'N/A')=COALESCE(LISTAGG(course_id, ','),'N/A') THEN 0 ELSE 1 END) OVER (PARTITION BY emp_id, dept_id, f_priority, s_priority ORDER BY d_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
    FROM all_dates_and_emp_dept_courses
    GROUP BY emp_id, dept_id, d_date, f_priority, s_priority
)
SELECT DISTINCT emp_id, dept_id, course_dtl,
       MIN(d_date) OVER (PARTITION BY emp_id, dept_id, course_dtl ORDER BY count_changes RANGE BETWEEN 0 PRECEDING AND 0 FOLLOWING),
       MAX(d_date) OVER (PARTITION BY emp_id, dept_id, course_dtl ORDER BY count_changes RANGE BETWEEN 0 PRECEDING AND 0 FOLLOWING)
  FROM all_dates_and_emp_dept_priorities
  WHERE COALESCE(f_priority||'/'||CHAR(s_priority),'N/A')=COALESCE(max_fs_priority,'N/A')
ORDER BY emp_id, dept_id, 4
;
EMP_ID | 部门 ID | 课程_DTL | 4 | 5
-----: | ------: | ---------: | --------: | --------:
    E1 | D1 | C1 | 20190419 | 20190520
    E1 | D1 | C3,C4 | 20190521 | 20190620
    E1 | D1 | C4,C5 | 20190621 | 20191020
    E1 | D1 | C6 | 20191021 | 20200120
    E1 | D1 | C7,C8 | 20200121 | 99991231
    E2 | D2 | C5 | 20190621 | 20190720
    E2 | D2 | C3 | 20190721 | 99991231
    E2 |    | C1 | 20190419 | 20190520
    E2 |    | C3,C4 | 20190521 | 99991231

db<>在这里摆弄


推荐阅读