首页 > 解决方案 > SQL:如何在滑动窗口框架中使用线性回归或 regr_slope()?

问题描述

我有以下代码来生成示例表。

create or replace table aggr(ACCT_ID NUMBER(18,0), YYYYMMDD NUMBER(18,0), value NUMBER(10, 2));
insert into aggr values(1,0, 0);
insert into aggr values(1,30, 90);
insert into aggr values(1,60, 60);
insert into aggr values(1,90, 90);
insert into aggr values(2,0, 0);
insert into aggr values(2,30, 90);
insert into aggr values(2,60, 60);
insert into aggr values(2,90, 90);

我想要实现的是对于 each acc_idregr_slope(y, x)可以为给定前两行和当前行 wherex = YYYYMMDD和的每一行计算y = value。在这种情况下acct_id=1,对于每个帐户的前两行,由于它没有前两行,它将返回NULL。第三行,regr_slope()将使用 row1,row2,row3 来获取输出。

然后我尝试使用以下代码,但失败了

SELECT
      regr_slope(value, YYYYMMDD) over (PARTITION BY acct_id order by YYYYMMDD rows between 
2 preceding and current row) as avg_3mo_hist
FROM
    aggr

我收到以下错误:

SQL compilation error: error line 2 at position 79 Sliding window frame unsupported for function REGR_SLOPE

根据雪花文件,它说regr_slope()

When used as a window function: 
    This function does not support:

    - ORDER BY sub-clause in the OVER() clause.

    - Window frames.

我尝试了很多不同的方法,但都没有奏效。想知道如何用regr_slope()函数实现目标

标签: sqlsnowflake-cloud-data-platform

解决方案


使用窗口函数来获取“3 个月前”并不是一个安全的操作,因为如果缺少一个月,那么突然间你会覆盖 4、5 或更多个月。

相反,您可以使用自联接解决前 3 个月的聚合:

SELECT
      a.ACCT_ID, a.YYYYMMDD, ARRAY_AGG(ARRAY_CONSTRUCT(b.value, b.YYYYMMDD)) arragg
      -- over (PARTITION BY acct_id order by YYYYMMDD rows between 2 preceding and current row) as avg_3mo_hist
FROM aggr a
JOIN aggr b
ON a.acct_id=b.acct_id
-- replace with real dates and date math
AND a.YYYYMMDD BETWEEN b.YYYYMMDD AND b.YYYYMMDD+90 
GROUP BY 1, 2
ORDER BY 1, 2

在此处输入图像描述

有了这些数据,现在很容易获得所需的 3 个月斜率:

WITH window_90_day
AS (
    SELECT
          a.ACCT_ID, a.YYYYMMDD, ARRAY_AGG(ARRAY_CONSTRUCT(b.value, b.YYYYMMDD)) arragg
    FROM aggr a
    JOIN aggr b
    ON a.acct_id=b.acct_id
    -- replace with real dates and date math
    AND a.YYYYMMDD BETWEEN b.YYYYMMDD AND b.YYYYMMDD+90 
    GROUP BY 1, 2
)

SELECT ACCT_ID, YYYYMMDD
  , regr_slope(x.value[0], x.value[1]) slope
FROM window_90_day, table(flatten(arragg)) x
GROUP BY 1,2

在此处输入图像描述


推荐阅读