首页 > 解决方案 > 如何扩展缺少日期的窗口函数的输出

问题描述

我有一张看起来像这样的桌子:

    > DESC sensordata;
+-----------+--------------+------+-----+---------+----------------+
| Field     | Type         | Null | Key | Default | Extra          |
+-----------+--------------+------+-----+---------+----------------+
| id        | int(11)      | NO   | PRI | NULL    | auto_increment |
| device_id | int(11)      | YES  |     | NULL    |                |
| timestamp | datetime     | YES  | MUL | NULL    |                |
| ...                                                              |
+-----------+--------------+------+-----+---------+----------------+

(我知道调用 datetime 字段timestamp不是很好,因为它是一个关键字)

最小的可重现示例:

CREATE TABLE `sensordata` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `device_id` int(11) DEFAULT NULL,
  `timestamp` datetime DEFAULT NULL,
  PRIMARY KEY (`id`)
);

INSERT INTO `sensordata` VALUES
  (1,1,'2020-01-01 13:10:00'),
  (2,1,'2020-01-02 13:20:00'),
  (3,1,'2020-01-04 13:40:00'),
  (4,2,'2020-01-01 13:10:00'),
  (5,2,'2020-01-02 13:10:00'),
  (6,2,'2020-01-04 13:10:00'),
  (7,2,'2020-01-04 13:10:00'),
  (8,2,'2020-01-04 13:10:00'),
  (9,2,'2020-01-04 13:10:00'),
  (10,2,'2020-01-05 13:10:00');

我想得到这个表中越来越多的行数,按 device_id 和时间戳分组。到目前为止,我得到了这个查询:

SELECT DISTINCT(DATE(timestamp)) as time, 
       CAST(device_id as CHAR) as device, 
       count(id) OVER (PARTITION BY device ORDER BY time) AS sum 
   FROM sensordata

它几乎给了我想要的东西:

+------------+--------+-----+
| time       | device | sum |
+------------+--------+-----+
| 2020-01-01 | 1      |   1 |
| 2020-01-02 | 1      |   2 |
| 2020-01-04 | 1      |   3 |
| 2020-01-01 | 2      |   1 |
| 2020-01-02 | 2      |   2 |
| 2020-01-04 | 2      |   6 |
| 2020-01-05 | 2      |   7 |
+------------+--------+-----+

如您所见,总和会随着时间正确增加。但我还需要包括总和不变的日期,因此表格应如下所示:

+------------+--------+-----+
| time       | device | sum |
+------------+--------+-----+
| 2020-01-01 | 1      |   1 |
| 2020-01-02 | 1      |   2 |
| 2020-01-03 | 1      |   2 |
| 2020-01-04 | 1      |   3 |
| 2020-01-05 | 1      |   3 |
| 2020-01-01 | 2      |   1 |
| 2020-01-02 | 2      |   2 |
| 2020-01-03 | 2      |   2 |
| 2020-01-04 | 2      |   6 |
| 2020-01-05 | 2      |   7 |
+------------+--------+-----+

在这种情况下,如何用正确的总和包括缺失的日期?这对使用的窗口函数可行吗?

编辑:添加了一个可重现的示例

edit2:扩展示例,每天和设备更多行

标签: mysql

解决方案


WITH RECURSIVE cte1 AS
( SELECT DATE(MIN(`timestamp`)) ts
  FROM sensordata
UNION ALL
  SELECT ts + INTERVAL 1 DAY
  FROM cte1
  WHERE ts < ( SELECT DATE(MAX(`timestamp`)) ts
               FROM sensordata ) ),
cte2 AS ( SELECT DISTINCT device_id
          FROM sensordata )
SELECT DISTINCT
       cte1.ts, 
       cte2.device_id, 
       COUNT(sensordata.id) OVER ( PARTITION BY cte2.device_id
                                   ORDER BY cte1.ts ) `sum`
FROM cte1
CROSS JOIN cte2
LEFT JOIN sensordata ON cte1.ts = DATE(sensordata.`timestamp`)
                    AND cte2.device_id = sensordata.device_id;

小提琴


推荐阅读