首页 > 解决方案 > 将维度实体的历史时期合并为一个

问题描述

我有一个渐变维度类型 2,其行相同(开始和结束日期除外)。如何编写一个漂亮的 SQL 查询来合并相同且具有连接时间段的行?

当前数据

+-------------+---------------------+--------------+------------+
| DimensionID | DimensionAttribute  | RowStartDate | RowEndDate |
+-------------+---------------------+--------------+------------+
|           1 | SomeValue           | 2019-01-01   | 2019-01-31 |
|           1 | SomeValue           | 2019-02-01   | 2019-02-28 |
|           1 | AnotherValue        | 2019-03-01   | 2019-03-31 |
|           1 | SomeValue           | 2019-04-01   | 2019-04-30 |
|           1 | SomeValue           | 2019-05-01   | 2019-05-31 |
|           2 | SomethingElse       | 2019-01-01   | 2019-01-31 |
|           2 | SomethingElse       | 2019-02-01   | 2019-02-28 |
|           2 | SomethingElse       | 2019-03-01   | 2019-03-31 |
|           2 | CompletelyDifferent | 2019-04-01   | 2019-04-30 |
|           2 | SomethingElse       | 2019-05-01   | 2019-05-31 |
+-------------+---------------------+--------------+------------+

结果

+-------------+---------------------+--------------+------------+
| DimensionID | DimensionAttribute  | RowStartDate | RowEndDate |
+-------------+---------------------+--------------+------------+
|           1 | SomeValue           | 2019-01-01   | 2019-02-28 |
|           1 | AnotherValue        | 2019-03-01   | 2019-03-31 |
|           1 | SomeValue           | 2019-04-01   | 2019-05-31 |
|           2 | SomethingElse       | 2019-01-01   | 2019-03-31 |
|           2 | CompletelyDifferent | 2019-04-01   | 2019-04-30 |
|           2 | SomethingElse       | 2019-05-01   | 2019-05-31 |
+-------------+---------------------+--------------+------------+

标签: sqltsqlbusiness-intelligencedimension

解决方案


对于这个版本的问题,我会使用lag()来确定组的开始位置,然后是累积总和和聚合:

select dimensionid, DimensionAttribute,
       min(row_start_date), max(row_end_date)
from (select t.*,
             sum(case when prev_red = dateadd(day, -1, row_start_date)
                      then 0 else 1
                 end) over (partition by dimensionid, DimensionAttribute order by row_start_date) as grp
      from (select t.*, 
                   lag(row_end_date) over (partition by dimensionid, DimensionAttribute order by row_start_date) as prev_red
            from t 
           ) t
     ) t
group by dimensionid, DimensionAttribute, grp;

特别是,这将识别行中的间隙。它只会在它们完全组合在一起时组合行——前一个结束日期是开始日期的前一天。当然,这可以进行调整,以允许 1 或 2 天的间隔或允许重叠。


推荐阅读