sql - (REDSHIFT) 垂直合并 / FIRST_VALUE() 作为聚合
问题描述
(这是 Redshift 特有的,应该考虑到它的柱状性质、排序顺序等)
当按时间戳排序时,我需要按类别从每列中获取第一个非 NULL 值。
本质上,与 FIRST_VALUE() 相同,但作为一个聚合。
或者,将 COALESCE() 作为聚合。
然而,Redshift 没有更高版本的 PostgreSQL 或 Oracle 的优点。所以,我正在寻找选项来测试我的 1 亿行导入:)
(我不喜欢我的任何一个选择,但我很难找到更好的选择。)
样本输入
category | row_timestamp | value_a | value_b | value_c
----------+---------------+---------+---------+---------
01 | 001 | NULL | NULL | 4
01 | 010 | 7 | NULL | NULL
01 | 100 | NULL | 1 | 2
01 | 999 | 6 | 3 | 6
02 | 001 | 1 | NULL | NULL
02 | 010 | NULL | 2 | NULL
02 | 100 | NULL | 1 | 9
02 | 999 | 6 | 3 | 2
预期成绩
category | value_a | value_b | value_c
----------+-------------------------+---------+---------
01 | 7 | 1 | 4
02 | 1 | 2 | 9
当前解决方案
SELECT DISTINCT
category,
FIRST_VALUE(value_a IGNORE NULLS)
OVER (PARTITION BY category
ORDER BY row_timestamp
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
AS value_a,
FIRST_VALUE(value_b IGNORE NULLS)
OVER (PARTITION BY category
ORDER BY row_timestamp
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
AS value_b,
FIRST_VALUE(value_c IGNORE NULLS)
OVER (PARTITION BY category
ORDER BY row_timestamp
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
AS value_c
FROM
mytable
它有效,但 DISTINCT 可能适用于数百或数千行。不太理想。
如果它只用于一两列,这可能会起作用(但它是用于十几列,所以这很可怕) ......
WITH
sorted_value_a AS
(
SELECT
category,
value_a,
ROW_NUMBER() OVER (PARTITION BY category
ORDER BY value_a IS NOT NULL, row_timestamp
)
AS row_ordinal
FROM
myTable
),
sorted_value_b AS
(
SELECT
category,
value_b,
ROW_NUMBER() OVER (PARTITION BY category
ORDER BY value_b IS NOT NULL, row_timestamp
)
AS row_ordinal
FROM
myTable
),
sorted_value_c AS
(
SELECT
category,
value_c,
ROW_NUMBER() OVER (PARTITION BY category
ORDER BY value_c IS NOT NULL, row_timestamp
)
AS row_ordinal
FROM
myTable
)
SELECT
*
FROM
sorted_value_a AS a
INNER JOIN
sorted_value_b AS b
ON b.category = a.category
INNER JOIN
sorted_value_c AS c
ON c.category = a.category
解决方案
好吧,我不知道这是否美观,但你可以这样做:
select category, value_a, value_b, value_c, value_d
from (select coalesce(value_a, lag(value_a ignore nulls) over (partition by category order by row_timestamp)) as value_a,
coalesce(value_b, lag(value_b ignore nulls) over (partition by category order by row_timestamp)) as value_b,
coalesce(value_c, lag(value_c ignore nulls) over (partition by category order by row_timestamp)) as value_c,
coalesce(value_d, lag(value_d ignore nulls) over (partition by category order by row_timestamp)) as value_d
row_number() over (partition by category order by row_timestamp desc) as seqnum
from mytable t
) t
where seqnum = 1;
推荐阅读
- javascript - 如何正确接收 AJAX 发送的数据
- javascript - mathjax 字体在移动设备中变小
- python - 从csv中查找不规则数据系列的最大梯度
- swift - 如何使用 char 从一个点到另一个点获取字符串的特定部分
- javascript - amcharts 为生成的图表配置标题和背景颜色以导出为 PDF
- .net - 在 Memorystream 中压缩和解压缩
- java - 如何收集/减少java 8流到pojo?
- apache - 如何在 Centos Apache 上使用不同端口在同一域上运行两个网站
- ruby-on-rails - 有条件地设置引导表单选择使用嵌入式 Ruby 选择的标记
- javascript - 美国专利商标局 API 请求不允许