首页 > 解决方案 > MariaDB语句中如何替换UNION子句的思考

问题描述

(创建测试表的代码在此消息的底部。)

给定两个表:

SELECT * FROM test_users;
+----+-------+
| ID | name  |
+----+-------+
|  1 | Tom   |
|  2 | Wendy |
|  3 | Fred  |
|  4 | Sandy |
+----+-------+
4 rows in set (0.000 sec)

SELECT * FROM test_hours;
+----+----------+------+-------+
| ID | users_ID | year | hours |
+----+----------+------+-------+
|  1 |        1 | 2018 |     3 |
|  2 |        1 | 2018 |     5 |
|  3 |        1 | 2019 |     7 |
|  4 |        1 | 2019 |     2 |
|  5 |        1 | 2019 |     9 |
|  6 |        1 | 2020 |     5 |
|  7 |        1 | 2020 |     9 |
|  8 |        2 | 2018 |     7 |
|  9 |        2 | 2018 |    11 |
| 10 |        2 | 2018 |     8 |
| 11 |        2 | 2019 |    10 |
| 12 |        2 | 2019 |    12 |
| 13 |        3 | 2018 |     4 |
| 14 |        3 | 2018 |     1 |
| 15 |        3 | 2018 |    15 |
| 16 |        3 | 2020 |    10 |
| 17 |        3 | 2020 |    12 |
| 18 |        4 | 2019 |     7 |
| 19 |        4 | 2019 |    11 |
| 20 |        4 | 2020 |     4 |
| 21 |        4 | 2020 |     6 |
+----+----------+------+-------+
21 rows in set (0.000 sec)

我可以使用一个非常简单的连接来获取用户和年份的小时摘要:

SELECT name, year, SUM(hours) 
FROM test_hours 
JOIN test_users
  ON users_ID = test_users.ID
GROUP BY users_ID, year;
+-------+------+------------+
| name  | year | SUM(hours) |
+-------+------+------------+
| Tom   | 2018 |          8 |
| Tom   | 2019 |         18 |
| Tom   | 2020 |         14 |
| Wendy | 2018 |         26 |
| Wendy | 2019 |         22 |
| Fred  | 2018 |         20 |
| Fred  | 2020 |         22 |
| Sandy | 2019 |         18 |
| Sandy | 2020 |         10 |
+-------+------+------------+
9 rows in set (0.001 sec)

如果我只想要一年,我可以这样做:

SELECT name, year, SUM(hours) 
FROM test_hours 
JOIN test_users
  ON users_ID = test_users.ID
WHERE year = 2020
GROUP BY users_ID, year;
+-------+------+------------+
| name  | year | SUM(hours) |
+-------+------+------------+
| Tom   | 2020 |         14 |
| Fred  | 2020 |         22 |
| Sandy | 2020 |         10 |
+-------+------+------------+
3 rows in set (0.000 sec)

温蒂退学了,因为她没有 2020 小时。不过,我真正想要的是:

+--------+------+------------+
| name   | year | SUM(hours) |
+--------+------+------------+
| Tom    | 2020 |         14 |
| Wendy  | 2020 |          0 |
| Fred   | 2020 |         22 |
| Sandy  | 2020 |         10 |
+--------+------+------------+

我可以使用 UNION 子句来做到这一点:

SELECT name, year, SUM(hours) 
FROM test_hours 
JOIN test_users
  ON users_ID = test_users.ID
WHERE year = 2020
GROUP BY users_ID, year

UNION

SELECT DISTINCT name, 2020, 0 
FROM test_hours 
JOIN test_users
  ON users_ID = test_users.ID
WHERE NOT EXISTS (
    SELECT *
    FROM test_hours
    WHERE users_ID = test_users.ID AND year = 2020);
+-------+------+------------+
| name  | year | SUM(hours) |
+-------+------+------------+
| Tom   | 2020 |         14 |
| Fred  | 2020 |         22 |
| Sandy | 2020 |         10 |
| Wendy | 2020 |          0 |
+-------+------+------------+
4 rows in set (0.001 sec)

但我想知道是否有更好的方法;完整的 SQL 语句已经包含许多 UNION 子句,我正在努力思考如何消除它们。

我无法弄清楚如何做到这一点。

有任何想法吗?

CREATE DATABASE IF NOT EXISTS test_db;
USE test_db;

DROP TABLE IF EXISTS test_hours;
CREATE TABLE test_hours (
  ID int(10) AUTO_INCREMENT PRIMARY KEY,
  users_ID int(10),
  year int(4),
  hours int(4)
);

DROP TABLE IF EXISTS test_users;
CREATE TABLE test_users (
  ID int(10),
  name varchar(60) 
);

INSERT INTO test_users (ID, name) VALUES
  (1, 'Tom'), (2, 'Wendy'), (3, 'Fred'), (4, 'Sandy');

INSERT INTO test_hours (users_ID, year, hours) VALUES
  (1, 2018, 3), (1, 2018, 5), (1, 2019, 7), (1, 2019, 2), (1, 2019, 9), (1, 2020, 5), (1, 2020, 9),
  (2, 2018, 7), (2, 2018, 11), (2, 2018, 8), (2, 2019, 10), (2, 2019, 12),
  (3, 2018, 4), (3, 2018, 1), (3, 2018, 15), (3, 2020, 10), (3, 2020, 12),
  (4, 2019, 7), (4, 2019, 11), (4, 2020, 4), (4, 2020, 6);

标签: sqlmariadbquery-optimizationunion

解决方案


您有一个常见问题的过于复杂的示例——即使数据集中缺少某些年份(或几天或几小时)(并且您希望显示为零),也需要一个事物列表。

一般的解决方案是从所有可能值的表格开始。这可能需要独立于数据进行计算。MariaDB 有一个特别方便的方法来做到这一点——

SELECT num AS 'year'
    FROM seq_2000_to_2100  -- so you don't have to change this
    WHERE num BETWEEN 2018 AND 2020  -- unclear on what range you want
    -- You could do MIN(year) to MAX(year) from test_hours

然后按如下方式使用:

SELECT u.name,
       year,
       COALESCE(SUM(h.hours), 0) AS tot_hours  -- Note COALESCE to avoid NULL
    FROM ( ( the above query )
           LEFT JOIN test_hours USING(year)  -- note LEFT
         )
    JOIN users ...
    WHERE ...
    

推荐阅读