首页 > 解决方案 > 顺序链接数据的滚动总和

问题描述

我正在处理一个大型遗留数据集,其中包含顺序相关的数据,我没有文字来解释,所以我制作了一个漂亮的绘画图像。这当然不是真正的数据集,但它很接近。在示例中存在三个序列。

数据集的记录表示

每条记录都有一个 ID 和一个值。它还有一个指向下一个相关 ID 的指针。序列长度是随机的,并在下一个相关 ID 达到 0 值时停止。所有记录在一个序列中只使用一次,这意味着它们不能合并或拆分。一个序列只能包含一个记录。

我需要完成的是使用 SQL 查询(SQL server 2014)获取序列的每条记录的滚动总和。如果序列中有一个公共标识符,我知道该怎么做,但在这种情况下没有。

通过查找先前的总和(如果存在)并添加当前值,我已经能够在 Excel 中完成它(对于它的价值)。但我无法将其翻译成 SQL。有谁知道从哪里开始达到 SQL 中“滚动总和结果”列的最终目标?

Excel中的序列数据示例

[previous sum] formule: =IFNA(INDEX([rolling sum formula],MATCH([@id],[next_pointer],0),0),0)
[rolling sum result] formula: =[@[previous sum]]+[@value]

*The data sequences aren't sorted like in the Excel example. This just makes it easier to read in the example.

标签: sqlsql-servertsql

解决方案


您需要类似 RECURSIVE 查询的东西。

你可以用 CTE 做到这一点。这是对您的数据的测试(您寻找的列是“cumul”,其他列是为了帮助了解正在发生的事情):

WITH sequenza AS (
    SELECT       
        id, 
        value,
        nextid,
        id AS lastid,
        value as cumul
    FROM       
        items
    WHERE nextid = 0
    UNION ALL
    SELECT
        curr.id, 
        curr.value,
        curr.nextid,
        prev.lastid,
        prev.cumul + curr.value AS cumul
    FROM 
        items AS curr
        INNER JOIN sequenza AS prev
            ON prev.id = curr.nextid
)
SELECT * FROM sequenza
WHERE id = 31;

要以相反的顺序执行此操作……可能有不止一种方法。对于每条链(由其 lastid 标识),我想到了最小和最大累积值,然后我将应用梯形算法 - 在这种情况下,递减滚动和是 VALMIN+VALMAX-ROLLING .

所以,像

WITH sequenza AS (
    SELECT       
        id, 
        value,
        nextid,
        id AS lastid,
        value as cumul
    FROM       
        items
    WHERE nextid = 0
    UNION ALL
    SELECT
        curr.id, 
        curr.value,
        curr.nextid,
        prev.lastid,
        prev.cumul + curr.value AS cumul
    FROM 
        items AS curr
        INNER JOIN sequenza AS prev
            ON prev.id = curr.nextid
),
sequenza2 AS (
    SELECT       
        id, 
        value,
        nextid,
        id AS lastid,
        value as cumul
    FROM       
        items
    WHERE nextid = 0
    UNION ALL
    SELECT
        curr.id, 
        curr.value,
        curr.nextid,
        prev.lastid,
        prev.cumul + curr.value AS cumul
    FROM 
        items AS curr
        INNER JOIN sequenza2 AS prev
            ON prev.id = curr.nextid
)
SELECT sequenza.*, m1+m2-cumul AS cumulasc FROM sequenza
JOIN (
  SELECT lastid, MIN(cumul) AS m1, MAX(cumul) AS m2
  FROM sequenza2
  GROUP BY lastid
) AS cirpo ON (sequenza.lastid = cirpo.lastid)
ORDER BY sequenza.lastid, cumul DESC

推荐阅读