sql - 在 DB2 中执行 sql 查询时的复杂性和时间消耗
问题描述
我有一个包含超过 2 亿条记录的表,并试图在单个查询中获得结果,执行以下查询大约需要 1.5 小时。
我的表格数据:
+---------+------------+-----+-----+-----+
| cust_id | product_id | p_a | p_b | p_c |
+---------+------------+-----+-----+-----+
| 1 | a | 1 | 0 | 0 |
| 1 | b | 0 | 2 | 0 |
| 1 | a | 1 | 0 | 0 |
| 1 | c | 0 | 0 | 3 |
| 2 | a | 1 | 0 | 0 |
| 2 | c | 0 | 0 | 3 |
| 2 | c | 0 | 0 | 3 |
| 3 | b | 0 | 1 | 0 |
+---------+------------+-----+-----+-----+
我需要以下结果:
+----------------+-----+-----+-----+------+-------+
| No.ofcustomers | p_a | p_b | p_c | P_ab | P_abc |
+----------------+-----+-----+-----+------+-------+
| 3 | 0 | 1 | 0 | 0 | 1 |
+----------------+-----+-----+-----+------+-------+
p_a 返回仅购买产品 a 的客户数量。类似地,p_b、p_c、p_ab、p_abc。
到目前为止,我已经编写了以下查询,它给了我输出。但是,有什么方法可以减少执行查询所需的时间。我的意思是 1.5 小时对于任何查询来说都是非常高的。
WITH CTE AS
(
SELECT CUST_ID, SUM(P_A) AS PA, SUM(P_B) AS PB, SUM(P_C) AS PC
FROM TABLE
GROUP BY CUST_ID), CTE1 AS
(
SELECT COUNT(DISTINCT CUST_ID) AS NOOFCUSTOMERS,
COUNT(DISTINCT (CASE WHEN PA >= 1 AND PB = 0 AND PC = 0 THEN CUST_ID END)) AS P_A,
COUNT(DISTINCT (CASE WHEN PA = 0 AND PB >= 1 AND PC = 0 THEN CUST_ID END)) AS P_B,
COUNT(DISTINCT (CASE WHEN PA = 0 AND PB = 0 AND PC >= 1 THEN CUST_ID END)) AS P_C,
COUNT(DISTINCT (CASE WHEN PA >= 1 AND PB >= 1 AND PC = 0 THEN CUST_ID END)) AS P_AB,
COUNT(DISTINCT (CASE WHEN PA <> 0 AND PB <> 0 AND PC <> 0 THEN CUST_ID END)) AS P_ABC
FROM CTE)
SELECT *
FROM CTE1;
解决方案
您可以将查询简化为:
WITH CTE AS (
SELECT CUST_ID, SUM(P_A) AS PA, SUM(P_B) AS PB, SUM(P_C) AS PC
FROM TABLE
GROUP BY CUST_ID
)
SELECT COUNT(*) AS NOOFCUSTOMERS,
SUM(CASE WHEN PA >= 1 AND PB = 0 AND PC = 0 THEN 1 ELSE 0 END) AS P_A,
SUM(CASE WHEN PA = 0 AND PB >= 1 AND PC = 0 THEN 1 ELSE 0 END) AS P_B,
SUM(CASE WHEN PA = 0 AND PB = 0 AND PC >= 1 THEN 1 ELSE 0 END) AS P_C,
SUM(CASE WHEN PA >= 1 AND PB >= 1 AND PC = 0 THEN 1 ELSE 0 END) AS P_AB,
SUM(CASE WHEN PA <> 0 AND PB <> 0 AND PC <> 0 THEN 1 ELSE 0 END) AS P_ABC
FROM CTE;
第一个 CTE 按客户 ID 进行汇总,因此这些已经是唯一的。 COUNT(DISTINCT)
比其他聚合函数更昂贵,因此可能会损害性能。
推荐阅读
- firebase - 如何在一个项目下创建多个 Vue 应用
- python - 如何将 Dask DataFrame 与 Numpy Busday_count 一起使用?
- xml - Powershell 移动 xml 节点
- elixir - Jason 在 Elixir 中将二进制编码为 UUID
- python - 在 python 中调用 CosmosClient(url,key,proxy_config = d) 时出现错误?
- javascript - 由于引号不同,Array.prototype.filter() 不适用于对象数组
- python - JSONDecodeError:期望值 Python
- python - 在 Python 中将“大”整数转换为字节时遇到问题
- laravel - 一个modelName.php 和一个ModelNameController.php 可以处理多个相同(或相似)的表吗?
- reactjs - 在 Material UI 自动完成中的 Textfield 输入中显示匹配搜索选项的数量