首页 > 解决方案 > SQL查询根据类别列查找连续的局部最大值、最小值

问题描述

我有以下数据集

    Customer_ID Category    FROM_DATE    TO_DATE
      1            5        1/1/2000    12/31/2001
      1            6        1/1/2002    12/31/2003
      1            5        1/1/2004    12/31/2005 
      2            7        1/1/2010    12/31/2011
      2            7        1/1/2012    12/31/2013
      2            5        1/1/2014    12/31/2015
      3            7        1/1/2010    12/31/2011
      3            7        1/5/2012    12/31/2013
      3            5        1/1/2014    12/31/2015

我想要实现的结果是为具有相同类别的客户找到连续的本地最小/最大日期,并确定日期中的任何差距:

    Customer_ID      FROM_Date      TO_Date       Category  
      1              1/1/2000      12/31/2001         5  
      1              1/1/2002      12/31/2003         6
      1              1/1/2004      12/31/2005         5
      2              1/1/2010      12/31/2013         7  
      2              1/1/2014      12/31/2015         5
      3              1/1/2010      12/31/2011         7
      3              1/5/2012      12/31/2013         7
      3              1/1/2014      12/31/2015         5

我的代码适用于客户 1(返回所有 3 行)和客户 2(返回 2 行,每个类别的最小和最大日期)但对于客户 3,它无法识别 12/31/2011 和 1/5/之间的差距2012 年第 7 类。

    Customer_ID      FROM_Date      TO_Date       Category  
      3              1/1/2010      12/31/2013         7  
      3              1/1/2014      12/31/2015         5

这是我的代码:

SELECT Customer_ID, Category, min(From_Date), max(To_Date) FROM 
(
SELECT Customer_ID, Category, From_Date,To_Date
,row_number() over (order by member_id, To_Date) - row_number() over (partition by Customer_ID order by Category) as p
  FROM FFS_SAMP
  )  X
 group by Customer_ID,Category,p
 order by Customer_ID,min(From_Date),Max(To_Date)

标签: sqltsqldatetimewindow-functionsgaps-and-islands

解决方案


这是一种差距和孤岛问题。可能最安全的方法是使用累积max()来查找与以前记录的重叠。在没有重叠的地方,就会出现一个记录“孤岛”。所以:

select customer_id, min(from_date), max(to_date), category
from (select t.*,
             sum(case when prev_to_date >= from_date then 0 else 1 end) over
                 (partition by customer_id, category
                  order by from_date
                 ) as grp
      from (select t.*,
                   max(to_date) over (partition by customer_id, category
                                      order by from_date
                                      rows between unbounded preceding and 1 preceding
                                     ) as prev_to_date
            from t
           ) t
     ) t
group by customer_id, category, grp;

推荐阅读