首页 > 解决方案 > Hive 中的联合分组结果集

问题描述

我需要在 2018 日历年的各个季度中打破按 ID 列分组的 Hive 查询。下面是我目前的处理方式,我想要另一个选项来用更少的查询来实现相同的结果。

--查询 1 个 2018 年第 1 季度加上 Q2、Q3、Q4 的三个相同查询

Create TABLE Q12018 stored as ORC as
select
ID,
count(1) as cnt, 
sum(revenue) as revenue,
sum( (CASE
    WHEN condition1
    THEN 1
    ELSE 0 END)) as metric1,
sum( (CASE
    WHEN condition2
    THEN revenue
    ELSE 0 END)) as metric2,           

sum( (CASE
    WHEN condition3
    THEN 1
    ELSE 0 END)) as metric3,
sum( (CASE
    WHEN codition4
    THEN revenue
    ELSE 0 END)) as metric4                            
from mainTable
where month between 201801 and 201803
group by 
ID;

--查询2

Create TABLE combined2018 stored as ORC as

select * from  Q12018 

union all

select * from  Q22018 

union all

select * from  Q32018 

union all 

select * from  Q42018 ;

--查询3

Create TABLE Agg2018 stored as ORC as

Select 
ID,
Sum(cnt),
Sum(revenue),
Sum(metric1),
Sum(metric2),
sum(metric3),
sum(metric4)
from combined2018  
group by ID

标签: hivebigdatahiveql

解决方案


似乎最后您正在汇总按 ID 分组的所有季度结果。如果最终结果是季度结果的汇总,则更改 where 子句以包括整个年份范围以实现相同的最终结果。

select
   ID,
   count(1) as cnt, 
   sum(revenue) as revenue,
   sum((CASE  WHEN condition1  THEN 1  ELSE 0 END)) as metric1,
   sum((CASE  WHEN condition2  THEN revenue  ELSE 0 END)) as metric2,           
   sum((CASE  WHEN condition3  THEN 1  ELSE 0 END)) as metric3,
   sum((CASE  WHEN condition4  THEN revenue  ELSE 0 END)) as metric4                       
from mainTable
where month between 201801 and 201812
group by ID;

推荐阅读