首页 > 解决方案 > SQL如果缺少日期,如何填写时间序列的最后一个值

问题描述

例如:我有下表。

WITH
-- your input ....
input(t,grp,value) AS (
          SELECT TIMESTAMP '2020-05-28 00:00:00','A',55
UNION ALL SELECT TIMESTAMP '2020-05-28 00:00:00','B',1.09
UNION ALL SELECT TIMESTAMP '2020-05-28 00:00:00','C',1.8
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','A',68
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','B',1.9
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','C',1.19
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','A',10
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','B',15
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','C',0.88
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','A',22
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','B',15
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','C',13
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','A',66
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','B',88
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','C',99
)

如您所见,此表中缺少日期 2020-30-05 和 2020-31-05。因此,有必要使用按 GROUP 分组的 2020-29-05 信息填写这些日期。此外,今天的日期大于数据中的日期(06-03 vs 06-08)(因此在本月这些观察结果缺失。因此最终输出应该是这样的:

date2                     Group    number
2020-28-05 00:00:00         A        55
2020-28-05 00:00:00         B        1.09
2020-28-05 00:00:00         C        1.8
2020-29-05 00:00:00         A        68
2020-29-05 00:00:00         B        1.9
2020-29-05 00:00:00         C        1.19
2020-30-05 00:00:00         A        68
2020-30-05 00:00:00         B        1.9
2020-30-05 00:00:00         C        1.19
2020-31-05 00:00:00         A        68
2020-31-05 00:00:00         B        1.9
2020-31-05 00:00:00         C        1.19
2020-01-06 00:00:00         A        10
2020-01-06 00:00:00         B        15
2020-01-06 00:00:00         C        0.88
2020-02-06 00:00:00         A        22
2020-02-06 00:00:00         B        15
2020-02-06 00:00:00         C        13
2020-03-06 00:00:00         A        66
2020-03-06 00:00:00         B        88
2020-03-06 00:00:00         C        99
And for periods 03-06 till 08-06 the same values

2020-08-06 00:00:00         A        66
2020-08-06 00:00:00         B        88
2020-08-06 00:00:00         C        99

以下代码有助于查找日期中的缺失值,但是这些空白并未填补今天的日期。如何解决?


SELECT ts AS t, grp, TS_FIRST_VALUE(value,'const') AS value
FROM input
TIMESERIES ts AS '1 DAY' OVER(PARTITION BY grp ORDER BY t)
ORDER BY 1,2


标签: sqlvertica

解决方案


它被称为INTERPOLATE而不是EXTRAPOLATE,这就是挑战。

您需要将每组的最后一行添加到输入表中,但使用今天的日期而不是实际/原始日期。请注意我在下面使用的padding和公用表表达式。paddedVertica 有我在这里使用的分析限制子句LIMIT 1 OVER(PARTITION BY grp ORDER BY tmstmp DESC): ..

WITH
input(tmstmp,grp,nbr) AS (
          SELECT TIMESTAMP '2020-05-28 00:00:00','A',55
UNION ALL SELECT TIMESTAMP '2020-05-28 00:00:00','B',1.09
UNION ALL SELECT TIMESTAMP '2020-05-28 00:00:00','C',1.8
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','A',68
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','B',1.9
UNION ALL SELECT TIMESTAMP '2020-05-29 00:00:00','C',1.19
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','A',10
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','B',15
UNION ALL SELECT TIMESTAMP '2020-06-01 00:00:00','C',0.88
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','A',22
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','B',15
UNION ALL SELECT TIMESTAMP '2020-06-02 00:00:00','C',13
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','A',66
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','B',88
UNION ALL SELECT TIMESTAMP '2020-06-03 00:00:00','C',99
)
,
padding AS (
  SELECT
    CURRENT_DATE::timestamp
  , grp
  , nbr
  FROM input
  LIMIT 1 OVER(PARTITION BY grp ORDER BY tmstmp DESC)
)
,
padded AS (
  SELECT * FROM input
  UNION ALL
  SELECT * FROM padding
)
SELECT
  ts AS tmstmp
, grp
, TS_FIRST_VALUE(nbr,'const') AS nbr
FROM padded
TIMESERIES ts AS '1 DAY' OVER(PARTITION BY grp ORDER BY tmstmp)
ORDER BY 1,2
;
-- out        tmstmp        | grp |  nbr  
-- out ---------------------+-----+-------
-- out  2020-05-28 00:00:00 | A   | 55.00
-- out  2020-05-28 00:00:00 | B   |  1.09
-- out  2020-05-28 00:00:00 | C   |  1.80
-- out  2020-05-29 00:00:00 | A   | 68.00
-- out  2020-05-29 00:00:00 | B   |  1.90
-- out  2020-05-29 00:00:00 | C   |  1.19
-- out  2020-05-30 00:00:00 | A   | 68.00
-- out  2020-05-30 00:00:00 | B   |  1.90
-- out  2020-05-30 00:00:00 | C   |  1.19
-- out  2020-05-31 00:00:00 | A   | 68.00
-- out  2020-05-31 00:00:00 | B   |  1.90
-- out  2020-05-31 00:00:00 | C   |  1.19
-- out  2020-06-01 00:00:00 | A   | 10.00
-- out  2020-06-01 00:00:00 | B   | 15.00
-- out  2020-06-01 00:00:00 | C   |  0.88
-- out  2020-06-02 00:00:00 | A   | 22.00
-- out  2020-06-02 00:00:00 | B   | 15.00
-- out  2020-06-02 00:00:00 | C   | 13.00
-- out  2020-06-03 00:00:00 | A   | 66.00
-- out  2020-06-03 00:00:00 | B   | 88.00
-- out  2020-06-03 00:00:00 | C   | 99.00
-- out  2020-06-04 00:00:00 | A   | 66.00
-- out  2020-06-04 00:00:00 | B   | 88.00
-- out  2020-06-04 00:00:00 | C   | 99.00
-- out  2020-06-05 00:00:00 | A   | 66.00
-- out  2020-06-05 00:00:00 | B   | 88.00
-- out  2020-06-05 00:00:00 | C   | 99.00
-- out  2020-06-06 00:00:00 | A   | 66.00
-- out  2020-06-06 00:00:00 | B   | 88.00
-- out  2020-06-06 00:00:00 | C   | 99.00
-- out  2020-06-07 00:00:00 | A   | 66.00
-- out  2020-06-07 00:00:00 | B   | 88.00
-- out  2020-06-07 00:00:00 | C   | 99.00
-- out  2020-06-08 00:00:00 | A   | 66.00
-- out  2020-06-08 00:00:00 | B   | 88.00
-- out  2020-06-08 00:00:00 | C   | 99.00
-- out  2020-06-09 00:00:00 | A   | 66.00
-- out  2020-06-09 00:00:00 | B   | 88.00
-- out  2020-06-09 00:00:00 | C   | 99.00


推荐阅读