首页 > 解决方案 > 在 DB2 中执行 sql 查询时的复杂性和时间消耗

问题描述

我有一个包含超过 2 亿条记录的表,并试图在单个查询中获得结果,执行以下查询大约需要 1.5 小时。

我的表格数据:

+---------+------------+-----+-----+-----+
| cust_id | product_id | p_a | p_b | p_c |
+---------+------------+-----+-----+-----+
|       1 | a          |   1 |   0 |   0 |
|       1 | b          |   0 |   2 |   0 |
|       1 | a          |   1 |   0 |   0 |
|       1 | c          |   0 |   0 |   3 |
|       2 | a          |   1 |   0 |   0 |
|       2 | c          |   0 |   0 |   3 |
|       2 | c          |   0 |   0 |   3 |
|       3 | b          |   0 |   1 |   0 |
+---------+------------+-----+-----+-----+

我需要以下结果:

+----------------+-----+-----+-----+------+-------+
| No.ofcustomers | p_a | p_b | p_c | P_ab | P_abc |
+----------------+-----+-----+-----+------+-------+
|              3 |   0 |   1 |   0 |    0 |     1 |
+----------------+-----+-----+-----+------+-------+

p_a 返回仅购买产品 a 的客户数量。类似地,p_b、p_c、p_ab、p_abc。

到目前为止,我已经编写了以下查询,它给了我输出。但是,有什么方法可以减少执行查询所需的时间。我的意思是 1.5 小时对于任何查询来说都是非常高的。

WITH CTE AS
       (
    SELECT CUST_ID, SUM(P_A) AS PA, SUM(P_B) AS PB, SUM(P_C) AS PC
      FROM TABLE
      GROUP BY CUST_ID), CTE1 AS
       (
    SELECT COUNT(DISTINCT CUST_ID) AS NOOFCUSTOMERS, 
        COUNT(DISTINCT (CASE WHEN PA >= 1 AND PB = 0 AND PC = 0 THEN CUST_ID END)) AS P_A, 
        COUNT(DISTINCT (CASE WHEN PA = 0 AND PB >= 1 AND PC = 0 THEN CUST_ID END)) AS P_B, 
        COUNT(DISTINCT (CASE WHEN PA = 0 AND PB = 0 AND PC >= 1 THEN CUST_ID END)) AS P_C, 
        COUNT(DISTINCT (CASE WHEN PA >= 1 AND PB >= 1 AND PC = 0 THEN CUST_ID END)) AS P_AB, 
        COUNT(DISTINCT (CASE WHEN PA <> 0 AND PB <> 0 AND PC <> 0 THEN CUST_ID END)) AS P_ABC
      FROM CTE)
  SELECT *
    FROM CTE1;

标签: sqldb2

解决方案


您可以将查询简化为:

WITH CTE AS (
      SELECT CUST_ID, SUM(P_A) AS PA, SUM(P_B) AS PB, SUM(P_C) AS PC
      FROM TABLE
      GROUP BY CUST_ID
     )
SELECT COUNT(*) AS NOOFCUSTOMERS, 
       SUM(CASE WHEN PA >= 1 AND PB = 0 AND PC = 0 THEN 1 ELSE 0 END) AS P_A, 
       SUM(CASE WHEN PA = 0 AND PB >= 1 AND PC = 0 THEN 1 ELSE 0 END) AS P_B, 
       SUM(CASE WHEN PA = 0 AND PB = 0 AND PC >= 1 THEN 1 ELSE 0 END) AS P_C, 
       SUM(CASE WHEN PA >= 1 AND PB >= 1 AND PC = 0 THEN 1 ELSE 0 END) AS P_AB, 
       SUM(CASE WHEN PA <> 0 AND PB <> 0 AND PC <> 0 THEN 1 ELSE 0 END) AS P_ABC
FROM CTE;

第一个 CTE 按客户 ID 进行汇总,因此这些已经是唯一的。 COUNT(DISTINCT)比其他聚合函数更昂贵,因此可能会损害性能。


推荐阅读