首页 > 解决方案 > 查询返回重复值

问题描述

下面的查询显示了一些重复和错误的值:

SELECT c.contest_id, c.hacker_id, c.name,
SUM(s.total_submissions) as total_submissions,
SUM(s.total_accepted_submissions) as total_accepted_submissions,
SUM(v.total_views) as total_views,
SUM(v.total_unique_views) as total_unique_views
FROM concursos c
JOIN faculdades f ON f.contest_id = c.contest_id
JOIN desafios d ON d.college_id = f.college_id
LEFT JOIN view_stats v ON v.challenge_id = d.challenge_id
LEFT JOIN submission_stats s ON s.challenge_id = d.challenge_id
GROUP BY c.contest_id;

输出应如下所示:

contest_id | hacker_id | name   | total_submissions | total_accepted_submissions | total_views | total_unique_views |
+------------+-----------+--------+-------------------+----------------------------+-------------+--------------------+
|      66406 |     17973 | Rose   |               111 |                         39 |         156 |                56 |
|      66556 |     79153 | Angela |                 0 |                          0 |          11 |                 10 |
|      94828 |     80275 | Frank  |               150 |                         38 |          41 |                15

但它是这样出来的:

contest_id | hacker_id | name   | total_submissions | total_accepted_submissions | total_views | total_unique_views |
+------------+-----------+--------+-------------------+----------------------------+-------------+--------------------+
|      66406 |     17973 | Rose   |               222 |                         78 |         238 |                122 |
|      66556 |     79153 | Angela |              NULL |                       NULL |          11 |                 10 |
|      94828 |     80275 | Frank  |               150 |                         38 |          82 |                 30

表架构:

在此处输入图像描述

标签: mysqlsql

解决方案


发生什么了

view_stats问题是submission_stats每个challenge_id.

查询中的JOINs 发生在GROUP BY和之前SUMGROUP BY所以想象一下,没有and的查询的结果集SUM

一个简化的例子是:

ids table:

id
--
 1


x table:

id|vx
------
 1|11
 1|22


y table:

id|vy
------
 1| 1

的结果

SELECT ids.id, x.vx, y.vy
FROM ids
LEFT JOIN x on x.id = ids.id
LEFT JOIN y on y.id = ids.id;

将会

| id  | vx  | vy  |
| --- | --- | --- |
| 1   | 11  | 1   |
| 1   | 22  | 1   |

1请注意列中的重复项vy,尽管在原始y表中只有一行。发生这种情况是因为id=1table 中有两行x。这些是首先连接的,因此也复制了ids表的行。然后y加入这些已经复制的行,这些行也复制了行y。当SUM'ing和分组时,我们最终得到。

| id  | SUM(vy) |
| --- | ------- |
| 1   |       2 |

你可以在这里找到一个带有简化示例的 dbfiddle 。

解决方案

有多种方法可以解决这个问题。最直观的就是加入它们之前的行GROUP和行。SUMview_statssubmission_stats

SELECT c.contest_id, c.hacker_id, c.name,
   SUM(s.total_submissions) as total_submissions,
   SUM(s.total_accepted_submissions) as total_accepted_submissions,
   SUM(v.total_views) as total_views,
   SUM(v.total_unique_views) as total_unique_views
FROM concursos c
JOIN faculdades f ON f.contest_id = c.contest_id
JOIN desafios d ON d.college_id = f.college_id
LEFT JOIN (
   SELECT 
      challenge_id, 
      SUM(total_views) as total_views,
      SUM(total_unique_views) as total_unique_views
   FROM view_stats
   GROUP BY challenge_id
) v ON v.challenge_id = d.challenge_id
LEFT JOIN (
   SELECT 
      challenge_id,
      SUM(total_submissions) as total_submissions,
      SUM(total_accepted_submissions) as total_accepted_submissions
   FROM submission_stats
   GROUP BY challenge_id
) s ON s.challenge_id = d.challenge_id
GROUP BY c.contest_id
# to output only rows with non zero sums
HAVING
   IFNULL(SUM(s.total_submissions), 0) <> 0
   OR IFNULL(SUM(s.total_accepted_submissions), 0) <> 0
   OR IFNULL(SUM(v.total_views), 0) <> 0
   OR IFNULL(SUM(v.total_unique_views), 0) <> 0;

推荐阅读