首页 > 解决方案 > 创建一个带有计数的矩阵 - hive sql

问题描述

有没有办法用蜂巢来实现这一点?我需要计算每个细分市场的用户。

我有一个表:
user1,categoryA
user1,categoryB
user2,categoryC

期望的输出是:

----------------- A类、B类、C
类 A类 -- 1 1 0
B类 -- 1 1 0
C类 -- 0 0 1

标签: sqlhadoophivehiveql

解决方案


对于静态类别集,这是可能的:

with your_data as(
select stack (6, 
'user1', 'categoryA',
'user1', 'categoryB',
'user2', 'categoryC',
'user2', 'categoryC',
'user3', 'categoryA',
'user4', 'categoryA'                  
) as (`user`, category)
)

select 
      category, sum(catA) as CategoryA, sum(catB) as CategoryB, sum(catC) as CategoryC
from
(
  select `user` , category, --each user counted once per category
          max(case when category='categoryA' then 1 else 0 end) over (partition by `user`) as catA,
          max(case when category='categoryB' then 1 else 0 end) over (partition by `user`) as catB,
          max(case when category='categoryC' then 1 else 0 end) over (partition by `user`) as catC
  from your_data
   group by  `user` , category
)s
group by Category
order by category

结果:

category    categorya   categoryb   categoryc
categoryA      3           1           0
categoryB      1           1           0
categoryC      0           0           1

推荐阅读