首页 > 解决方案 > 获得多个基于时间间隔的总和的有效方法?

问题描述

有没有一种有效的方法来获取多个组的多个时间范围内的总和(下面的示例)?

我正在使用 MySQL 8。

PS:我很难把这个问题表述得尽可能清楚。我很乐意接受建议以使其更清晰/易于理解......


OBS:

原始表:

SELECT* FROM pluv_reg ORDER BY dth DESC;

id   dth                    valor
1    2020-05-07 16:20:00    5.0
2    2020-05-07 16:20:00    5.0
3    2020-05-07 16:20:00    5.0
4    2020-05-07 16:20:00    5.0
1    2020-05-07 16:15:00    5.0
2    2020-05-07 16:15:00    5.0
...

期望回报:

id   dth                    m05     m15    m30     h01     h24      h48      h96
1    2020-05-07 16:20:00    5.0    15.0    60.0    60.0    100.0    100.0    196.0
2    2020-05-07 16:20:00    5.0    15.0    60.0    60.0    100.0    100.0    197.0
3    2020-05-07 16:20:00    5.0    15.0    60.0    60.0    100.0    100.0    198.0
4    2020-05-07 16:20:00    5.0    15.0    60.0    60.0    100.0    100.0    199.0
1    2020-05-07 16:15:00    5.0    15.0    55.0    55.0    95.0     95.0     191.0
2    2020-05-07 16:15:00    5.0    15.0    55.0    55.0    95.0     95.0     192.0
3    2020-05-07 16:15:00    5.0    15.0    55.0    55.0    95.0     95.0     193.0
...

LEGEND
Column m05 -> Sum of "valor" of the last 05 minutes for that "id" at that time
...
Column m30 -> Sum of "valor" of the last 30 minutes for that "id" at that time
...
Column h48 -> Sum of "valor" of the last 48 hours for that "id" at that time

当前查询:

SELECT *
FROM (
    SELECT pluv_reg.id,
            pluv_reg.dth,
            SUM(pluv_reg.valor)  OVER w05 AS 'm05',
            SUM(pluv_reg.valor)  OVER w15 AS 'm15',
            SUM(pluv_reg.valor)  OVER w30 AS 'm30',
            SUM(pluv_reg.valor)  OVER w1 AS 'h01',
            SUM(pluv_reg.valor)  OVER w24 AS 'h24',
            SUM(pluv_reg.valor)  OVER w48 AS 'h48',
            SUM(pluv_reg.valor)  OVER w96 AS 'h96'
    FROM pluv_reg

    # this WHERE clause ensures that all the rows I need have the correct values ______________________________
    WHERE pluv_reg.dth > DATE_SUB(CAST('2020-05-07 16:20:00' AS DATETIME), INTERVAL 98 HOUR)

    WINDOW  w05 AS (PARTITION BY id ORDER BY dth RANGE INTERVAL '0:04' HOUR_MINUTE PRECEDING),
            w15 AS (PARTITION BY id ORDER BY dth RANGE INTERVAL '0:14' HOUR_MINUTE PRECEDING),
            w30 AS (PARTITION BY id ORDER BY dth RANGE INTERVAL '0:29' HOUR_MINUTE PRECEDING),
            w1 AS (PARTITION BY id ORDER BY dth RANGE INTERVAL '0:59' HOUR_MINUTE PRECEDING),
            w24 AS (PARTITION BY id ORDER BY dth RANGE INTERVAL '23:59' HOUR_MINUTE PRECEDING),
            w48 AS (PARTITION BY id ORDER BY dth RANGE INTERVAL '47:59' HOUR_MINUTE PRECEDING),
            w96 AS (PARTITION BY id ORDER BY dth RANGE INTERVAL '95:59' HOUR_MINUTE PRECEDING)
    ) subquery

# this HAVING clause returns only the rows I need _____________________________________________________________
HAVING subquery.dth >= DATE_SUB(CAST('2020-05-07 16:20:00' AS DATETIME), INTERVAL 2 HOUR);

我的当前查询(上图)有效,但在我的实际生产表上需要很长时间(最多 10 秒。我想加快速度至少快 10 倍)。

我的CURRENT QUERY的问题在于它计算了过去 98 小时内所有行的总和( WHERE pluv_reg.dth > DATE_SUB(CAST('2020-05-07 16:20:00' AS DATETIME), INTERVAL 98 HOUR ) ) 即使我只需要过去 2 小时的这些总和HAVING subquery.dth >= DATE_SUB(CAST('2020-05-07 16:20:00' AS DATETIME), INTERVAL 2 HOUR));

在下面的示例中,子查询本身产生 32 行(生产时为 40.000)。HAVING子句将其过滤为仅 20 行(生产时为 700 行)——这些是我需要的唯一行。

创建和插入语句:

CREATE TABLE pluv_reg (
id INT,
dth DATETIME,
valor DECIMAL(5,1),

PRIMARY KEY(id, dth)
);

INSERT INTO pluv_reg (id,dth,valor) VALUES
(1, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 0 MINUTE), 5),
(1, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 5 MINUTE), 5),
(1, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 10 MINUTE), 5),
(1, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 15 MINUTE), 5),
(1, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 20 MINUTE), 40),
(1, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 10 HOUR), 40),
(1, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 60 HOUR), 72),
(1, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 90 HOUR), 24),

(2, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 0 MINUTE), 5),
(2, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 5 MINUTE), 5),
(2, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 10 MINUTE), 5),
(2, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 15 MINUTE), 5),
(2, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 20 MINUTE), 40),
(2, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 10 HOUR), 40),
(2, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 60 HOUR), 72),
(2, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 90 HOUR), 24),

(3, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 0 MINUTE), 5),
(3, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 5 MINUTE), 5),
(3, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 10 MINUTE), 5),
(3, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 15 MINUTE), 5),
(3, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 20 MINUTE), 40),
(3, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 10 HOUR), 40),
(3, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 60 HOUR), 72),
(3, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 90 HOUR), 24),

(4, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 0 MINUTE), 5),
(4, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 5 MINUTE), 5),
(4, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 10 MINUTE), 5),
(4, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 15 MINUTE), 5),
(4, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 20 MINUTE), 40),
(4, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 10 HOUR), 40),
(4, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 60 HOUR), 72),
(4, (CAST('2020-05-07 16:20:00' AS DATETIME) - INTERVAL 90 HOUR), 24)
;

标签: mysqlsqlwindow-functions

解决方案


推荐阅读