首页 > 解决方案 > SQL 查询 - 超过阈值的聚合总和

问题描述

假设我有这张桌子:

姓名 日期 价值
n1 12 2020-12-01 7
n1 12 2020-12-05 9
n1 12 2020-12-09 17
n1 12 2020-12-14 8
n2 8 2020-08-02 12
n2 8 2020-08-08 7
n2 8 2020-08-09 14
n3 9 2020-09-01 5
n3 9 2020-09-03 11
n3 9 2020-09-07 10
n3 9 2020-09-21 7

按名称和月份(按日期排序)分组的 SQL 查询是什么,一旦 Value 字段的聚合总和大于 15,则将其选为结果的一行?每个名称的日期字段的值是唯一的。例如,所有具有 Name='n1' 的行都有不同的日期值。这也适用于 n2,n3,...。

根据上面的例子,结果应该如下:

姓名 价值
n1 12 16
n1 12 17
n2 8 19
n3 9 16
n3 9 17

谢谢

标签: sqlsql-serveroracle

解决方案


一种可能的方法是 SQL Server 和 Oracle 中的递归子查询。您需要决定是否需要重置当前步骤的计算。

我仍然无法理解您从输出中删除的原因n1 12 2020-12-14n2 8 2020-08-09但您可以通过and isleaf = 1从下面的示例中删除来做到这一点。

这是带有注释的查询:(UPD:为标准 SQL 递归删除select ... from( select并重写reset_flag了计算)

with a as (
  select 'n1' as name, 12 as month, convert(date, '2020-12-01', 23) as dt, 7 as val union all
  select 'n1' as name, 12 as month, convert(date, '2020-12-05', 23) as dt, 9 as val union all
  select 'n1' as name, 12 as month, convert(date, '2020-12-09', 23) as dt, 17 as val union all
  select 'n1' as name, 12 as month, convert(date, '2020-12-14', 23) as dt, 8 as val union all
  select 'n2' as name, 8  as month, convert(date, '2020-08-02', 23) as dt, 12 as val union all
  select 'n2' as name, 8  as month, convert(date, '2020-08-08', 23) as dt, 7 as val union all
  select 'n2' as name, 8  as month, convert(date, '2020-08-09', 23) as dt, 14 as val union all
  select 'n3' as name, 9  as month, convert(date, '2020-09-01', 23) as dt, 5 as val union all
  select 'n3' as name, 9  as month, convert(date, '2020-09-03', 23) as dt, 11 as val union all
  select 'n3' as name, 9  as month, convert(date, '2020-09-07', 23) as dt, 10 as val union all
  select 'n3' as name, 9  as month, convert(date, '2020-09-21', 23) as dt, 7 as val
)
, rn as (
  /*Build calculation hierarchy from 1st to last*/
  select
    name,
    month,
    dt,
    val,
    dense_rank() over(partition by name, month order by dt asc) as rn,
    /*To identify last item in group*/
    dense_rank() over(partition by name, month order by dt desc) as rn_desc
  from a
)
/*Simulate running sum with reset*/
, rec (
  name,
  month,
  running_sum,
  dt_until,
  rn,
  isleaf,
  reset_flag
) as (
  /*Start from 1st item*/
  select
    name,
    month,
    val as running_sum,
    dt,
    rn,
    case when rn_desc = 1 then 1 else 0 end as isleaf,
    case when val > 15 then 1 else 0 end as reset_flag
  from rn
  where rn = 1
  
  union all
  
  /*Calculate current value*/
  select
    rec.name,
    rec.month,
    /*
      If we need to reset calculation,
      then use original value,
      else - add value to running total
    */
    case
      when rec.running_sum > 15
      then rn.val
      else rec.running_sum + rn.val
    end as running_sum,
    dt,
    rec.rn + 1 as rn,
    case when rn.rn_desc = 1 then 1 else 0 end as isleaf,
    case
      /*Reset on threshold violation after addition*/
      when rec.running_sum <= 15
        and rec.running_sum + rn.val > 15
      then 1
      /*Or when there was reset before and current value vuolates threshold also*/
      when rn.val > 15
      then 1
      else 0
    end as reset_flag
  from rec
    join rn
      on rec.name = rn.name
        and rec.month = rn.month
        and rec.rn + 1 = rn.rn
)
select *
from rec
where isleaf = 1
  or reset_flag = 1
order by 1, 2, rn asc
GO
姓名 | 月 | 运行总和 | dt_直到 | rn | 小岛 | 重置标志
:--- | ----: | ----------: | :--------- | -: | -----: | ---------:
n1 | 12 | 16 | 2020-12-05 | 2 | 0 | 1
n1 | 12 | 17 | 2020-12-09 | 3 | 0 | 1
n1 | 12 | 8 | 2020-12-14 | 4 | 1 | 0
n2 | 8 | 19 | 2020-08-08 | 2 | 0 | 1
n2 | 8 | 14 | 2020-08-09 | 3 | 1 | 0
n3 | 9 | 16 | 2020-09-03 | 2 | 0 | 1
n3 | 9 | 17 | 2020-09-21 | 4 | 1 | 1

db<>fiddle here for SQL Server。而对于甲骨文来说

UPD:上一个查询在这个db<>fiddle中)

Oracle withMODEL子句的另一种方式。CV在这里,我们用函数而不是递归来重置计算。

with a as (
  select 'n1' as name, 12 as month, date '2020-12-01' as dt, 7 as val   from dual union all
  select 'n1' as name, 12 as month, date '2020-12-05' as dt, 9 as val   from dual union all
  select 'n1' as name, 12 as month, date '2020-12-09' as dt, 17 as val  from dual union all
  select 'n1' as name, 12 as month, date '2020-12-14' as dt, 8 as val   from dual union all
  select 'n2' as name, 8  as month, date '2020-08-02' as dt, 12 as val  from dual union all
  select 'n2' as name, 8  as month, date '2020-08-08' as dt, 7 as val   from dual union all
  select 'n2' as name, 8  as month, date '2020-08-09' as dt, 14 as val  from dual union all
  select 'n3' as name, 9  as month, date '2020-09-01' as dt, 5 as val   from dual union all
  select 'n3' as name, 9  as month, date '2020-09-03' as dt, 11 as val  from dual union all
  select 'n3' as name, 9  as month, date '2020-09-07' as dt, 10 as val  from dual union all
  select 'n3' as name, 9  as month, date '2020-09-21' as dt, 7 as val   from dual 
)
, rn as (
  /*Build calculation hierarchy from 1st to last*/
  select
    name,
    month,
    dt,
    val,
    0 as rsum,
    0 as keep_flag,
    dense_rank() over(partition by name, month order by dt asc) as rn
  from a
)
, rsum as (
  /*Running sum with reset*/
  select *
  from rn
  model
    /*When to break calculation*/
    partition by (name, month)
    /*Dimension to iterate with model*/
    dimension by (rn)
    /*Value, running sum and the flag where we reset calculations*/
    measures (val, rsum, keep_flag, dt)
    /*Keep null values out of calculation range to identify last row per group*/
    keep nav
    rules update
    (
      /*For all sequential numberer RNs in ascending order*/
      rsum[rn > 0] order by rn asc
        /*When we still have place till 15 (e.g previous calculation of RSUM
        is not greater than 15), we add current value of VAL to previous RSUM.
        Else we need to restart
        */
        = case
            when rsum[cv() - 1] <= 15
            then rsum[cv() - 1] + val[cv()]
            else val[cv()]
          end,
      
      /*Again, when there's a place and we are not at the end of partition,
      we mark the row as participating in another aggregated row
      */
      keep_flag[rn > 0] order by rn asc
        = case
            when rsum[cv()] <= 15 and rsum[cv() + 1] is not null
            then 0
            else 1
          end
    )
)
select
  name,
  month,
  rsum,
  dt
from rsum
/*Keep only aggregated rows or last row per group*/
where keep_flag = 1
order by name, month, rn asc
姓名 | 月 | RSUM | DT       
:--- | ----: | ---: | :--------
n1 | 12 | 16 | 20 年 12 月 5 日
n1 | 12 | 17 | 20 年 12 月 9 日
n1 | 12 | 8 | 20 年 12 月 14 日
n2 | 8 | 19 | 20 年 8 月 8 日
n2 | 8 | 14 | 20 年 8 月 9 日
n3 | 9 | 16 | 20 年 9 月 3 日
n3 | 9 | 17 | 20 年 9 月 21 日

db<>在这里摆弄


推荐阅读