首页 > 解决方案 > hive sql 内连接 - 如何在同一查询中获取 sum & row_num

问题描述

我有 2 个表 product & psales,表中的数据如下所示

select * from psales;
+-------------+---------------+--+
| psales.pid  | psales.sales  |
+-------------+---------------+--+
| 1           | 100           |
| 1           | 150           |
| 1           | 200           |
| 2           | 75            |
| 2           | 45            |
| 2           | 145           |
| 3           | 176           |
| 3           | 99            |
| 1           | 27            |
| 4           | 51            |
+-------------+---------------+--+

select * from product;
+--------------+----------------+--+
| product.pid  | product.pname  |
+--------------+----------------+--+
| 1            | p1             |
| 2            | p2             |
| 3            | p3             |
| 4            | p4             |
+--------------+----------------+--+

目标是获得总销售额第二高的产品。

这是我目前用来获取综合销售额最高的产品的查询(工作正常)

select p1.pname, p1.total_sales
from (select p.pid as pid, p.pname as pname, s.sales as sales,  
      sum(s.sales) over (partition by p.pid order by p.pid) as total_sales 
      from product p
      inner join psales s on (p.pid = s.pid) 
      order by total_sales desc) p1 
limit 1;

我如何获得总销售额第二高的产品?

当我尝试在内部查询中获取 row_num 时,它给了我如下所示的错误:

select p1.pname as pname, p1.total_sales as total_sales, row_num() over (partition by pname order by pname) as rownum 
from (select p.pid as pid, p.pname as pname, s.sales as sales,  
      sum(s.sales) over (partition by p.pid order by p.pid) as total_sales, 
      row_num() over (partition by p.pid) as rownum 
      from product p 
      inner join psales s on (p.pid = s.pid) 
      order by total_sales desc) p1 
where rownum =2;

错误:编译语句时出错:FAILED:SemanticException 无法将窗口调用分解为组。至少 1 个组必须仅依赖于输入列。还要检查循环依赖。基础错误:无效的函数 row_num (state=42000,code=40000)

提前感谢您的帮助。

标签: hivehiveql

解决方案


row_number()在上层子查询中使用函数。看来你不需要 analytic sum(),简单group by就可以了:

select p1.pname, p1.pid, p1.total_sales
  from
(
select p1.pname, p1.pid, p1.total_sales, 
       row_number() over (order by total_sales  desc) rn
  from 
     (select p.pid, p.pname, sum(s.sales) as total_sales 
        from product p 
             inner join psales s on p.pid = s.pid
        group by p.pid, p.pname
     )p1
)s
where rn=2
;

如果要选择具有相同销售额的所有产品,请使用dense_rank()而不是。row_number()


推荐阅读