首页 > 解决方案 > Postgresql:根据项目出现在列上的次数进行聚合

问题描述

我一直很难找到这个。有没有更简单的方法来做到这一点?

城市 行业
标记 东京
标记 东京 零售
标记 悉尼
乍得 伦敦 营销
乍得 纽约
乍得 纽约 营销

期望的输出:需要根据列中项目出现的次数进行汇总。全部按人分组

城市 行业
标记 东京
乍得 纽约 营销

有没有比计数>排名>选择排名= 1的更简单/更有效的方法?因为我尝试这样做,但我必须将所有内容重新加入person。我有很多专栏要处理,所以我试图找到一种更简单的方法。

标签: sqlamazon-redshift

解决方案


您可以使用窗口函数:

select t.person,
       ( array_agg(city order by cnt_city desc) )[1] as city,
       ( array_agg(industry order by cnt_industry desc) )[1] as industry
from (select t.*,
             count(*) over (partition by person, city) as cnt_city,
             count(*) over (partition by person, industry) as cnt_industry
      from t
     ) t
group by t.person;

或者,或者,distinct on对每一列进行聚合:

select *
from (select distinct on (person) person, city
      from t
      group by person, city
      order by person, order by count(*) desc
     ) c join
     (select distinct on (person) person, industry
      from t
      group by person, industry
      order by person, order by count(*) desc
     ) i
     using (person);

Redshift 既不支持distinct on也不支持数组。所以与其:

select *
from (select person, city,
             row_number() over (partition by person order by count(*) desc) as seqnum
      from t
      group by person, city
     ) c join
     (select person, industry,
             row_number() over (partition by person order by count(*) desc) as seqnum
      from t
      group by person, industry
     ) i
     using (person)
where c.seqnum = 1 and i.seqnum = 1;   

推荐阅读