sql - SQL如果缺少日期,如何填写时间序列的最后一个值
问题描述
例如:我有下表。
WITH
-- your input ....
input(t,grp,value) AS (
SELECT TIMESTAMP '2020-05-28 00:00:00','A',55
UNION ALL SELECT TIMESTAMP '2020-05-28 00:00:00','B',1.09
UNION ALL SELECT TIMESTAMP '2020-05-28 00:00:00','C',1.8
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','A',68
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','B',1.9
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','C',1.19
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','A',10
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','B',15
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','C',0.88
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','A',22
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','B',15
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','C',13
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','A',66
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','B',88
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','C',99
)
如您所见,此表中缺少日期 2020-30-05 和 2020-31-05。因此,有必要使用按 GROUP 分组的 2020-29-05 信息填写这些日期。此外,今天的日期大于数据中的日期(06-03 vs 06-08)(因此在本月这些观察结果缺失。因此最终输出应该是这样的:
date2 Group number
2020-28-05 00:00:00 A 55
2020-28-05 00:00:00 B 1.09
2020-28-05 00:00:00 C 1.8
2020-29-05 00:00:00 A 68
2020-29-05 00:00:00 B 1.9
2020-29-05 00:00:00 C 1.19
2020-30-05 00:00:00 A 68
2020-30-05 00:00:00 B 1.9
2020-30-05 00:00:00 C 1.19
2020-31-05 00:00:00 A 68
2020-31-05 00:00:00 B 1.9
2020-31-05 00:00:00 C 1.19
2020-01-06 00:00:00 A 10
2020-01-06 00:00:00 B 15
2020-01-06 00:00:00 C 0.88
2020-02-06 00:00:00 A 22
2020-02-06 00:00:00 B 15
2020-02-06 00:00:00 C 13
2020-03-06 00:00:00 A 66
2020-03-06 00:00:00 B 88
2020-03-06 00:00:00 C 99
And for periods 03-06 till 08-06 the same values
2020-08-06 00:00:00 A 66
2020-08-06 00:00:00 B 88
2020-08-06 00:00:00 C 99
以下代码有助于查找日期中的缺失值,但是这些空白并未填补今天的日期。如何解决?
SELECT ts AS t, grp, TS_FIRST_VALUE(value,'const') AS value
FROM input
TIMESERIES ts AS '1 DAY' OVER(PARTITION BY grp ORDER BY t)
ORDER BY 1,2
解决方案
它被称为INTERPOLATE
而不是EXTRAPOLATE,这就是挑战。
您需要将每组的最后一行添加到输入表中,但使用今天的日期而不是实际/原始日期。请注意我在下面使用的padding
和公用表表达式。padded
Vertica 有我在这里使用的分析限制子句LIMIT 1 OVER(PARTITION BY grp ORDER BY tmstmp DESC)
: ..
WITH
input(tmstmp,grp,nbr) AS (
SELECT TIMESTAMP '2020-05-28 00:00:00','A',55
UNION ALL SELECT TIMESTAMP '2020-05-28 00:00:00','B',1.09
UNION ALL SELECT TIMESTAMP '2020-05-28 00:00:00','C',1.8
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','A',68
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','B',1.9
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','C',1.19
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','A',10
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','B',15
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','C',0.88
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','A',22
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','B',15
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','C',13
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','A',66
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','B',88
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','C',99
)
,
padding AS (
SELECT
CURRENT_DATE::timestamp
, grp
, nbr
FROM input
LIMIT 1 OVER(PARTITION BY grp ORDER BY tmstmp DESC)
)
,
padded AS (
SELECT * FROM input
UNION ALL
SELECT * FROM padding
)
SELECT
ts AS tmstmp
, grp
, TS_FIRST_VALUE(nbr,'const') AS nbr
FROM padded
TIMESERIES ts AS '1 DAY' OVER(PARTITION BY grp ORDER BY tmstmp)
ORDER BY 1,2
;
-- out tmstmp | grp | nbr
-- out ---------------------+-----+-------
-- out 2020-05-28 00:00:00 | A | 55.00
-- out 2020-05-28 00:00:00 | B | 1.09
-- out 2020-05-28 00:00:00 | C | 1.80
-- out 2020-05-29 00:00:00 | A | 68.00
-- out 2020-05-29 00:00:00 | B | 1.90
-- out 2020-05-29 00:00:00 | C | 1.19
-- out 2020-05-30 00:00:00 | A | 68.00
-- out 2020-05-30 00:00:00 | B | 1.90
-- out 2020-05-30 00:00:00 | C | 1.19
-- out 2020-05-31 00:00:00 | A | 68.00
-- out 2020-05-31 00:00:00 | B | 1.90
-- out 2020-05-31 00:00:00 | C | 1.19
-- out 2020-06-01 00:00:00 | A | 10.00
-- out 2020-06-01 00:00:00 | B | 15.00
-- out 2020-06-01 00:00:00 | C | 0.88
-- out 2020-06-02 00:00:00 | A | 22.00
-- out 2020-06-02 00:00:00 | B | 15.00
-- out 2020-06-02 00:00:00 | C | 13.00
-- out 2020-06-03 00:00:00 | A | 66.00
-- out 2020-06-03 00:00:00 | B | 88.00
-- out 2020-06-03 00:00:00 | C | 99.00
-- out 2020-06-04 00:00:00 | A | 66.00
-- out 2020-06-04 00:00:00 | B | 88.00
-- out 2020-06-04 00:00:00 | C | 99.00
-- out 2020-06-05 00:00:00 | A | 66.00
-- out 2020-06-05 00:00:00 | B | 88.00
-- out 2020-06-05 00:00:00 | C | 99.00
-- out 2020-06-06 00:00:00 | A | 66.00
-- out 2020-06-06 00:00:00 | B | 88.00
-- out 2020-06-06 00:00:00 | C | 99.00
-- out 2020-06-07 00:00:00 | A | 66.00
-- out 2020-06-07 00:00:00 | B | 88.00
-- out 2020-06-07 00:00:00 | C | 99.00
-- out 2020-06-08 00:00:00 | A | 66.00
-- out 2020-06-08 00:00:00 | B | 88.00
-- out 2020-06-08 00:00:00 | C | 99.00
-- out 2020-06-09 00:00:00 | A | 66.00
-- out 2020-06-09 00:00:00 | B | 88.00
-- out 2020-06-09 00:00:00 | C | 99.00
推荐阅读
- image - 图像宽高对迁移学习模型精度的影响
- android - 特定用户的 Firebase 数据库集合
- c# - EF Core ChangeTracker() 修改实体的当前值和原始值相同
- postgresql - docker compose后Postgresql连接失败
- mysql - 获取表的所有列并用 CASE 语句中的值替换一列
- authorization - 如何在 Cognito 中使用自定义权限
- css - 基于剪辑路径计算的 100% 半径是多少
- assembly - 我想知道如何计算计算机系统中给定虚拟机命令的跳转指令
- stack - 在 x86-64 上启用长模式和分页后,堆栈指针是否包含虚拟地址或物理地址
- android - 关于Android后台服务的基本&常见疑问