首页 > 解决方案 > 如何使用 percentile_disc() 和窗口计算移动中位数?

问题描述

该窗口正在使用 COUNT、AVG 等,但不适用于 percentile_disc

  SELECT
       x,
       COUNT(*) OVER w AS w_count, -- fine
       AVG(x) OVER w   AS avg_x,     -- fine
       percentile_disc(0.5) within group (order by x) OVER w AS mdn_x  -- BUG!
  FROM t
  WINDOW w AS (ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
  ORDER BY 1

(已编辑),

似乎这是不可能的......有一些解决方法吗?也许是带有子查询的横向连接。


正如这里所解释的,MEDIAN 很重要,在异常分析中优于 AVG。以及移动中位数,比移动平均线更好(最具弹性)。

标签: sqlpostgresqlwindow-functions

解决方案


我只是想到了另一种可能。我们可以创建一个聚合并像内置的avgcount.

让我们从一个聚合器开始:

create function median_sfunc (
    state integer[], data integer
) returns integer[] as
$$
begin
    if state is null then
        return array[data];
    else
        return state || data;
    end if;
end;
$$ language plpgsql;

然后是终结者:

create function median_ffunc (
    state integer[]
) returns double precision as
$$
begin
    return (state[(array_length(state, 1) + 1)/ 2] + state[(array_length(state, 1) + 2) / 2]) / 2.;
end;
$$ language plpgsql;

当然,供应商(初始状态)将是空查询。因此我们得到聚合:

create aggregate median (integer) (
    sfunc     = median_sfunc,
    stype     = integer[],
    finalfunc = median_ffunc
    );

现在,无论您使用什么窗口,您都可以优雅地调用它:

select x,
       count(*) over w as w_count,
       avg(x) over w as avg_x,
       median(x) over w as mdn_x
from tmp t
    window w as (order by x rows between 3 preceding and current row)

推荐阅读