sql - SQL 查询 - 超过阈值的聚合总和
问题描述
假设我有这张桌子:
姓名 | 月 | 日期 | 价值 |
---|---|---|---|
n1 | 12 | 2020-12-01 | 7 |
n1 | 12 | 2020-12-05 | 9 |
n1 | 12 | 2020-12-09 | 17 |
n1 | 12 | 2020-12-14 | 8 |
n2 | 8 | 2020-08-02 | 12 |
n2 | 8 | 2020-08-08 | 7 |
n2 | 8 | 2020-08-09 | 14 |
n3 | 9 | 2020-09-01 | 5 |
n3 | 9 | 2020-09-03 | 11 |
n3 | 9 | 2020-09-07 | 10 |
n3 | 9 | 2020-09-21 | 7 |
按名称和月份(按日期排序)分组的 SQL 查询是什么,一旦 Value 字段的聚合总和大于 15,则将其选为结果的一行?每个名称的日期字段的值是唯一的。例如,所有具有 Name='n1' 的行都有不同的日期值。这也适用于 n2,n3,...。
根据上面的例子,结果应该如下:
姓名 | 月 | 价值 |
---|---|---|
n1 | 12 | 16 |
n1 | 12 | 17 |
n2 | 8 | 19 |
n3 | 9 | 16 |
n3 | 9 | 17 |
谢谢
解决方案
一种可能的方法是 SQL Server 和 Oracle 中的递归子查询。您需要决定是否需要重置当前步骤的计算。
我仍然无法理解您从输出中删除的原因n1 12 2020-12-14
,n2 8 2020-08-09
但您可以通过and isleaf = 1
从下面的示例中删除来做到这一点。
这是带有注释的查询:(UPD:为标准 SQL 递归删除select ... from( select
并重写reset_flag
了计算)
with a as ( select 'n1' as name, 12 as month, convert(date, '2020-12-01', 23) as dt, 7 as val union all select 'n1' as name, 12 as month, convert(date, '2020-12-05', 23) as dt, 9 as val union all select 'n1' as name, 12 as month, convert(date, '2020-12-09', 23) as dt, 17 as val union all select 'n1' as name, 12 as month, convert(date, '2020-12-14', 23) as dt, 8 as val union all select 'n2' as name, 8 as month, convert(date, '2020-08-02', 23) as dt, 12 as val union all select 'n2' as name, 8 as month, convert(date, '2020-08-08', 23) as dt, 7 as val union all select 'n2' as name, 8 as month, convert(date, '2020-08-09', 23) as dt, 14 as val union all select 'n3' as name, 9 as month, convert(date, '2020-09-01', 23) as dt, 5 as val union all select 'n3' as name, 9 as month, convert(date, '2020-09-03', 23) as dt, 11 as val union all select 'n3' as name, 9 as month, convert(date, '2020-09-07', 23) as dt, 10 as val union all select 'n3' as name, 9 as month, convert(date, '2020-09-21', 23) as dt, 7 as val ) , rn as ( /*Build calculation hierarchy from 1st to last*/ select name, month, dt, val, dense_rank() over(partition by name, month order by dt asc) as rn, /*To identify last item in group*/ dense_rank() over(partition by name, month order by dt desc) as rn_desc from a ) /*Simulate running sum with reset*/ , rec ( name, month, running_sum, dt_until, rn, isleaf, reset_flag ) as ( /*Start from 1st item*/ select name, month, val as running_sum, dt, rn, case when rn_desc = 1 then 1 else 0 end as isleaf, case when val > 15 then 1 else 0 end as reset_flag from rn where rn = 1 union all /*Calculate current value*/ select rec.name, rec.month, /* If we need to reset calculation, then use original value, else - add value to running total */ case when rec.running_sum > 15 then rn.val else rec.running_sum + rn.val end as running_sum, dt, rec.rn + 1 as rn, case when rn.rn_desc = 1 then 1 else 0 end as isleaf, case /*Reset on threshold violation after addition*/ when rec.running_sum <= 15 and rec.running_sum + rn.val > 15 then 1 /*Or when there was reset before and current value vuolates threshold also*/ when rn.val > 15 then 1 else 0 end as reset_flag from rec join rn on rec.name = rn.name and rec.month = rn.month and rec.rn + 1 = rn.rn ) select * from rec where isleaf = 1 or reset_flag = 1 order by 1, 2, rn asc
GO
姓名 | 月 | 运行总和 | dt_直到 | rn | 小岛 | 重置标志 :--- | ----: | ----------: | :--------- | -: | -----: | ---------: n1 | 12 | 16 | 2020-12-05 | 2 | 0 | 1 n1 | 12 | 17 | 2020-12-09 | 3 | 0 | 1 n1 | 12 | 8 | 2020-12-14 | 4 | 1 | 0 n2 | 8 | 19 | 2020-08-08 | 2 | 0 | 1 n2 | 8 | 14 | 2020-08-09 | 3 | 1 | 0 n3 | 9 | 16 | 2020-09-03 | 2 | 0 | 1 n3 | 9 | 17 | 2020-09-21 | 4 | 1 | 1
db<>fiddle here for SQL Server。而对于甲骨文来说。
(UPD:上一个查询在这个db<>fiddle中)
Oracle withMODEL
子句的另一种方式。CV
在这里,我们用函数而不是递归来重置计算。
with a as ( select 'n1' as name, 12 as month, date '2020-12-01' as dt, 7 as val from dual union all select 'n1' as name, 12 as month, date '2020-12-05' as dt, 9 as val from dual union all select 'n1' as name, 12 as month, date '2020-12-09' as dt, 17 as val from dual union all select 'n1' as name, 12 as month, date '2020-12-14' as dt, 8 as val from dual union all select 'n2' as name, 8 as month, date '2020-08-02' as dt, 12 as val from dual union all select 'n2' as name, 8 as month, date '2020-08-08' as dt, 7 as val from dual union all select 'n2' as name, 8 as month, date '2020-08-09' as dt, 14 as val from dual union all select 'n3' as name, 9 as month, date '2020-09-01' as dt, 5 as val from dual union all select 'n3' as name, 9 as month, date '2020-09-03' as dt, 11 as val from dual union all select 'n3' as name, 9 as month, date '2020-09-07' as dt, 10 as val from dual union all select 'n3' as name, 9 as month, date '2020-09-21' as dt, 7 as val from dual ) , rn as ( /*Build calculation hierarchy from 1st to last*/ select name, month, dt, val, 0 as rsum, 0 as keep_flag, dense_rank() over(partition by name, month order by dt asc) as rn from a ) , rsum as ( /*Running sum with reset*/ select * from rn model /*When to break calculation*/ partition by (name, month) /*Dimension to iterate with model*/ dimension by (rn) /*Value, running sum and the flag where we reset calculations*/ measures (val, rsum, keep_flag, dt) /*Keep null values out of calculation range to identify last row per group*/ keep nav rules update ( /*For all sequential numberer RNs in ascending order*/ rsum[rn > 0] order by rn asc /*When we still have place till 15 (e.g previous calculation of RSUM is not greater than 15), we add current value of VAL to previous RSUM. Else we need to restart */ = case when rsum[cv() - 1] <= 15 then rsum[cv() - 1] + val[cv()] else val[cv()] end, /*Again, when there's a place and we are not at the end of partition, we mark the row as participating in another aggregated row */ keep_flag[rn > 0] order by rn asc = case when rsum[cv()] <= 15 and rsum[cv() + 1] is not null then 0 else 1 end ) ) select name, month, rsum, dt from rsum /*Keep only aggregated rows or last row per group*/ where keep_flag = 1 order by name, month, rn asc
姓名 | 月 | RSUM | DT :--- | ----: | ---: | :-------- n1 | 12 | 16 | 20 年 12 月 5 日 n1 | 12 | 17 | 20 年 12 月 9 日 n1 | 12 | 8 | 20 年 12 月 14 日 n2 | 8 | 19 | 20 年 8 月 8 日 n2 | 8 | 14 | 20 年 8 月 9 日 n3 | 9 | 16 | 20 年 9 月 3 日 n3 | 9 | 17 | 20 年 9 月 21 日
db<>在这里摆弄
推荐阅读
- jenkins - Jenkins 和 GitLab 的 Git SSH 签出失败
- c++ - lambda iso std::bind 用于成员函数
- javascript - Raphaeljs 中的对象碰撞
- r - 过滤日期以在 R 中仅显示当前月份和未来 12 个月
- python - 如何用python格式化非常小的数字?
- python - 在 Python 中通过多个自定义分隔符拆分列
- java - Unicode 规范化形式说明 (Java)
- r - 如何在函数返回的函数中将值作为常量传递?
- c++ - libcurl C++:如何在 CentOS 7 上正确安装和使用
- postgresql - postgres,从 jsonb 列生成的列,过滤