首页 > 解决方案 > 加入时间段,按id分组

问题描述

我有与同一用户相交的时间段的行。例如:

-------------------------------------------------------------
|    ID_USER    |     START_DATE      |      END_DATE       |
-------------------------------------------------------------
|       1       | 01/01/2018 08:00:00 | 01/01/2018 08:50:00 |
|       1       | 01/01/2018 08:15:00 | 01/01/2018 08:20:00 |
|       1       | 01/01/2018 08:45:00 | 01/01/2018 09:55:00 |
|       1       | 01/01/2018 15:45:00 | 01/01/2018 17:00:00 |
|       2       | 01/01/2018 08:45:00 | 01/01/2018 09:50:00 |
|       2       | 01/01/2018 09:15:00 | 01/01/2018 10:00:00 |
-------------------------------------------------------------

我想避免它。我想在一列中加入行,将开始日期作为最旧的日期,将结束日期作为最新的日期。上述示例的结果将是:

-------------------------------------------------------------
|    ID_USER    |     START_DATE      |      END_DATE       |
-------------------------------------------------------------
|       1       | 01/01/2018 08:00:00 | 01/01/2018 09:55:00 |
|       1       | 01/01/2018 15:45:00 | 01/01/2018 17:00:00 |
|       2       | 01/01/2018 08:45:00 | 01/01/2018 10:00:00 |
-------------------------------------------------------------

您知道如何在 Oracle 中使用 SQL 语句获得我想要的解决方案吗?

标签: sqloracleoracle11ggaps-and-islands

解决方案


您有两种类型的交叉点;第一个周期完全存在于另一个周期中(例如您的第二行,08:15-08:20),第二个周期与另一个周期的开始或结束重叠。

如果您消除了第一种类型,那么您可以使用领先和落后来窥视剩下的东西;我添加了第三个数据集以获得更多乐趣:

select id_user, start_date, end_date,
  case when start_date <= lag(end_date) over (partition by id_user order by start_date)
       then null
       else start_date
  end as calc_start_date,
  case when end_date >= lead(start_date) over (partition by id_user order by end_date)
       then null
       else end_date
  end as calc_end_date
from your_table t1
where not exists (
    select *
    from your_table t2
    where t2.id_user = t1.id_user
    and t2.start_date <= t1.start_date and t2.end_date >= t1.end_date
    and t2.rowid != t1.rowid
);
   ID_USER START_DATE          END_DATE            CALC_START_DATE     CALC_END_DATE         
---------- ------------------- ------------------- ------------------- ----------------------
         1 2018-01-01 08:00:00 2018-01-01 08:50:00 2018-01-01 08:00:00                       
         1 2018-01-01 08:45:00 2018-01-01 09:55:00                     2018-01-01 09:55:00   
         1 2018-01-01 15:45:00 2018-01-01 17:00:00 2018-01-01 15:45:00 2018-01-01 17:00:00   
         2 2018-01-01 08:45:00 2018-01-01 09:50:00 2018-01-01 08:45:00                       
         2 2018-01-01 09:15:00 2018-01-01 10:00:00                     2018-01-01 10:00:00   
         3 2018-01-01 08:00:00 2018-01-01 08:30:00 2018-01-01 08:00:00                       
         3 2018-01-01 08:15:00 2018-01-01 08:45:00                                           
         3 2018-01-01 08:45:00 2018-01-01 09:15:00                                           
         3 2018-01-01 09:00:00 2018-01-01 09:30:00                     2018-01-01 09:30:00   

not exists子句删除了第一种类型。

然后,您可以折叠剩下的内容,首先通过消除与两端重叠的行(在我的 ID 3 的额外行中),它​​们的超前值和滞后值都为空;然后再次使用前导和滞后将剩余的空值替换为其相邻行的值:

select distinct id_user,
  case when calc_start_date is null
       then lag(calc_start_date) over (partition by id_user order by start_date)
       else calc_start_date
  end as start_date,
  case when calc_end_date is null
       then lead(calc_end_date) over (partition by id_user order by end_date)
       else calc_end_date
  end as end_date
from (
  select id_user, start_date, end_date,
    case when start_date <= lag(end_date) over (partition by id_user order by start_date)
         then null
         else start_date
    end as calc_start_date,
    case when end_date >= lead(start_date) over (partition by id_user order by end_date)
         then null
         else end_date
   end as calc_end_date
  from your_table t1
  where not exists (
      select *
      from your_table t2
      where t2.id_user = t1.id_user
      and t2.start_date <= t1.start_date and t2.end_date >= t1.end_date
      and t2.rowid != t1.rowid
  )
)
where calc_start_date is not null
or calc_end_date is not null
order by id_user, start_date, end_date;
   ID_USER START_DATE          END_DATE           
---------- ------------------- -------------------
         1 2018-01-01 08:00:00 2018-01-01 09:55:00
         1 2018-01-01 15:45:00 2018-01-01 17:00:00
         2 2018-01-01 08:45:00 2018-01-01 10:00:00
         3 2018-01-01 08:00:00 2018-01-01 09:30:00

如果有一些我没有考虑过并且会导致问题的边缘情况,我不会完全感到惊讶,但无论如何希望这将是一个起点。


推荐阅读