首页 > 解决方案 > 如何计算特定文本字符串出现的次数并按其他列分组

问题描述

我有一个表population_table,其中包含带有user_idprovider_name和的列city。我想计算每个提供商的用户出现在每个城市的次数。例如,我希望输出看起来像这样:

provider_name |  Users |  Atlanta | Chicago | New York
______________________________________________________
Alpha            100        50         25        25
Beta             200       100         75        25
Kappa            500       300        100       100

我尝试使用:

select provider_name, count (distinct user_id) AS Users, count(city) AS City 
from population_table
group by provider_name

如何编写此查询以获取每个城市每个提供商的用户细分?

标签: sqlapache-spark

解决方案


我认为你想要条件聚合。从您的描述中不清楚这count(distinct)是必要的。所以我会先试试这个:

select provider_name, count(*) AS Users,
       sum(case when city = 'Atlanta' then 1 else 0 end) as Atlanta,
       sum(case when city = 'Chicago' then 1 else 0 end) as Chicago,
       sum(case when city = 'New York' then 1 else 0 end) as New_York
from population_table
group by provider_name;

如果count(distinct)有必要:

select provider_name, count(distinct user_id) AS Users,
       count(distinct case when city = 'Atlanta' then user_id end) as Atlanta,
       count(distinct case when city = 'Chicago' then user_id end) as Chicago,
       count(distinct case when city = 'New York' then user_id end) as New_York
from population_table
group by provider_name

推荐阅读