首页 > 解决方案 > 使用 group by 从两列中选择唯一的

问题描述

我有一张桌子:

CREATE TABLE stats_test
(
  id1 bigint,
  id2 bigint,
  date timestamp with time zone
);    

和里面的数据:

 id1 | id2 |          date
-----+-----+------------------------
   1 |   2 | 2020-12-01 00:00:00+00
   2 |   1 | 2020-12-01 00:00:00+00
   3 |   4 | 2020-11-01 00:00:00+00
   4 |   3 | 2020-11-01 00:00:00+00
   1 |   3 | 2020-12-01 00:00:00+00
   1 |   3 | 2020-11-01 00:00:00+00

通过此查询,我得到结果:

SELECT EXTRACT(YEAR FROM date), EXTRACT(MONTH FROM date), 
COUNT(DISTINCT id1) AS unique_id1, COUNT(DISTINCT id2) AS unique_id2 
FROM stats_test GROUP BY EXTRACT(YEAR FROM date), EXTRACT(MONTH FROM date);

 date_part | date_part | unique_id1 | unique_id2
-----------+-----------+------------+------------
      2020 |        11 |          3 |          2
      2020 |        12 |          2 |          3

如何从按年和月分组的两个列(id1,id2)的集合中获取具有计数唯一ID的另一列?

 date_part | date_part | unique_id1 | unique_id2 | unique_both_ids
-----------+-----------+------------+------------+----------------
      2020 |        11 |          3 |          2 |
      2020 |        12 |          2 |          3 |    

标签: sqlpostgresql

解决方案


count(distinct ..)只允许单个表达式(因此count(distinct id1,id2)被拒绝),但您可以使用匿名行表达式来克服该限制:

select extract(year from date) as year, 
       extract(month from date) as month,
       count(distinct id1) as unique_id1, 
       count(distinct id2) as unique_id2,
       count(distinct (id1,id2)) as unique_both_ids
from stats_test 
group by extract(year from date), extract(month from date);

请注意,1,2 和 2,1 将被视为两个不同的事物。如果您希望它们被视为相同的用途:count(distinct (least(id1,id2), greatest(id1,id2)))


推荐阅读