首页 > 解决方案 > 在“间隙和孤岛”问题中创建(“强制”)孤岛

问题描述

我有代码将我的数据分区为差距和孤岛解决方案。数据本身根据记录的时间戳和活动报告用户活动、工作时间和空闲时间。我的代码运行良好,但每隔一段时间我就有一个 user_id 记录一个应用程序的一系列活动,然后进入空闲状态,然后返回到同一个应用程序以记录其他活动。根据我当前的代码,看起来用户在一个应用程序中花费了将近两个小时,而实际上中间有很长的停机时间。我想“强制”创建一个岛,如果活动之间的间隔超过 30 分钟,则重新启动分区。

ACTIVITY_DATE | USER_ID | APPL_ID |  PR1  |  PR2
---------------------------------------------------

11/20/2020 10:55    A     9340         1    1
11/20/2020 10:55    A     9340         2    2
11/20/2020 10:58    A     9340         3    3
11/20/2020 10:58    A     9340         4    4
11/20/2020 10:59    A     9340         5    5
11/20/2020 13:09    A     9340         6    6
11/20/2020 13:09    A     9340         7    7
11/20/2020 13:10    A     9340         8    8
11/20/2020 13:10    A     9340         9    9
11/20/2020 17:12    A     8354        10    1
11/20/2020 17:14    A     8354        11    2
11/20/2020 17:14    A     8354        12    3

最终结果需要重新启动此示例中第六行 PR2 列的分区,因为对于相同的 appl_id,记录的活动之间的间隔超过了 30 分钟:

ACTIVITY_DATE | USER_ID | APPL_ID |  PR1  |  PR2
---------------------------------------------------

11/20/2020 10:55    A     9340         1    1
11/20/2020 10:55    A     9340         2    2
11/20/2020 10:58    A     9340         3    3
11/20/2020 10:58    A     9340         4    4
11/20/2020 10:59    A     9340         5    5
11/20/2020 13:09    A     9340         6    1
11/20/2020 13:09    A     9340         7    2
11/20/2020 13:10    A     9340         8    3
11/20/2020 13:10    A     9340         9    4
11/20/2020 17:12    A     8354        10    1
11/20/2020 17:14    A     8354        11    2
11/20/2020 17:14    A     8354        12    3

这是我当前的代码:

    select activity_date, user_id, appl_id,
        row_number() over(partition by user_id order by activity_date) rn1,
        row_number() over(partition by user_id, appl_id order by activity_date) rn2
    from 
    (select
    activity_date, user_id, appl_id, count(*)
    from mytable tt
    where
        user_id in ('A', 'B', 'C')
        and activity_date >= trunc(sysdate - 4,'DD')
        and activity_date <= trunc(sysdate - 3,'DD')
    group by
        activity_date, user_id, appl_id) tt

标签: oraclepartitiongaps-and-islands

解决方案


您可以使用MATCH_RECOGNIZE

SELECT activity_date,
       user_id,
       appl_id,
       pr1,
       ROW_NUMBER() OVER ( PARTITION BY user_id, appl_id, mno ORDER BY pr1 )
         AS pr2
FROM   (
  SELECT t.*,
         ROW_NUMBER() OVER ( PARTITION BY user_id ORDER BY activity_date) AS pr1
  FROM   table_name t
)
MATCH_RECOGNIZE(
  PARTITION BY user_id, appl_id
  ORDER     BY pr1
  MEASURES
    MATCH_NUMBER() AS mno
  ALL ROWS PER MATCH
  PATTERN ( activities* last_activity )
  DEFINE activities AS
    NEXT(activity_date) <= LAST(activity_date) + INTERVAL '30' MINUTE
)
ORDER BY user_id, pr1;

其中,对于样本数据:

CREATE TABLE table_name ( ACTIVITY_DATE, USER_ID, APPL_ID ) AS
SELECT DATE '2020-11-20' + INTERVAL '10:55' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '10:55' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '10:58' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '10:58' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '10:59' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '13:09' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '13:09' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '13:10' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '13:10' HOUR TO MINUTE, 'A', 9340 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '17:12' HOUR TO MINUTE, 'A', 8354 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '17:14' HOUR TO MINUTE, 'A', 8354 FROM DUAL UNION ALL
SELECT DATE '2020-11-20' + INTERVAL '17:14' HOUR TO MINUTE, 'A', 8354 FROM DUAL;

输出:

ACTIVITY_DATE | USER_ID | APPL_ID | 公关1 | PR2
:----------------- | :-------- | ------: | --: | --:
2020-11-20 10:55:00 | 一个 | 9340 | 1 | 1
2020-11-20 10:55:00 | 一个 | 9340 | 2 | 2
2020-11-20 10:58:00 | 一个 | 9340 | 3 | 3
2020-11-20 10:58:00 | 一个 | 9340 | 4 | 4
2020-11-20 10:59:00 | 一个 | 9340 | 5 | 5
2020-11-20 13:09:00 | 一个 | 9340 | 6 | 1
2020-11-20 13:09:00 | 一个 | 9340 | 7 | 2
2020-11-20 13:10:00 | 一个 | 9340 | 8 | 3
2020-11-20 13:10:00 | 一个 | 9340 | 9 | 4
2020-11-20 17:12:00 | 一个 | 8354 | 10 | 1
2020-11-20 17:14:00 | 一个 | 8354 | 11 | 2
2020-11-20 17:14:00 | 一个 | 8354 | 12 | 3

db<>在这里摆弄


推荐阅读