首页 > 解决方案 > 为列 A 的每个值选择列 B 的 N 个最频繁值

问题描述

使用 MySQL 表,如:

id | colA | colB
...| 1    | 13
...| 1    | 13
...| 1    | 12
...| 1    | 12
...| 1    | 11
...| 2    | 78
...| 2    | 78
...| 2    | 78
...| 2    | 13
...| 2    | 13
...| 2    | 9

对于中的每个值,colA我想在 中找到 N 个最常见的值colB

N=2 的示例结果:

colA | colB
1    | 13
1    | 12
2    | 78
2    | 13

我可以使用以下方法获得所有独特colAcolB频率组合:

SELECT colA, colB, COUNT(*) AS freq FROM t GROUP BY colA, colB ORDER BY freq DESC;

示例结果:

colA | colB | freq
1    | 13   | 2
1    | 12   | 2
1    | 11   | 1
2    | 78   | 3
2    | 13   | 2
2    | 9    | 1

但是我很难LIMIT为每个值colA而不是整个表应用一个。

这基本上就像如何在每个 id 组的列中选择最频繁的值?,仅适用于 MySQL 而不是 PostgreSQL。

我目前正在使用 MariaDB 10.1。

标签: mysqlselectmariadbgroupwise-maximum

解决方案


Use window functions, if you can:

SELECT colA, colB, freq
FROM (SELECT colA, colB, COUNT(*) AS freq,
             DENSE_RANK() OVER (PARTITION BY colA ORDER BY COUNT(*) DESC) as seqnum
      FROM t
      GROUP BY colA, colB 
     ) ab
WHERE seqnum <= 2;

Note that you might want DENSE_RANK(), RANK() or ROW_NUMBER() depending on how you want to treat ties. If there are 5 colB values with the two highest ranks, then DENSE_RANK() will return all five.

If you want exactly two values, then use ROW_NUMBER().


推荐阅读