首页 > 解决方案 > 来自多个列的“Hive”最大列值

问题描述

嗨:我有一种情况,我需要找到 3 个计算字段的最大值并将其存储在另一个字段中,是否可以在一个 SQL 查询中完成?下面是示例

SELECT Income1 ,
       Income1 * 2%  as Personal_Income ,
       Income2 ,
       Income2 * 10% as Share_Income ,
       Income3 ,
       Income3 * 1%  as Job_Income , 
       Max(Personal_Income, Share_Income, Job_Income ) 
  From Table

我尝试的一种方法是Personal_Income, Share_Income, Job_Income在第一遍和第二遍中计算

Select 
      Case when Personal_income > Share_Income and Personal_Income > Job_Income 
                then Personal_income 
           when Share_income > Job_Income 
                then Share_income 
           Else Job_income as the greatest_income

但这需要我对十亿行表进行 2 次扫描,我怎样才能避免这种情况并一次性完成?非常感谢任何帮助。

标签: sqlhadoophivehiveql

解决方案


As of Hive 1.1.0 you can use greatest() function. This query will do in a single table scan:

select Income1 ,
       Personal_Income ,
       Income2 ,
       Share_Income ,
       Income3 ,
       Job_Income ,
       greatest(Personal_Income, Share_Income, Job_Income ) as greatest_income
from
(
SELECT Income1 ,
       Income1 * 2%  as Personal_Income ,
       Income2 ,
       Income2 * 10% as Share_Income ,
       Income3 ,
       Income3 * 1%  as Job_Income , 
  From Table
)s
;

推荐阅读