sql - Hive:根据年份和员工获得最小值?
问题描述
我在下面有一个示例表:
+--------------+------------------+-----------+--+
| orderdate | employee | minprice |
+--------------+------------------+-----------+--+
| 1992-01-13 | Clerk#943 | 7328.08 |
| 1992-02-21 | Clerk#328 | 33818.37 |
| 1992-02-22 | Clerk#328 | 914.01 |
| 1992-03-03 | Clerk#943 | 10010.11 |
| 1992-03-19 | Clerk#158 | 2712.00 |
| 1992-03-20 | Clerk#328 | 23920.52 |
| 1992-04-05 | Clerk#158 | 919.01 |
| 1993-01-04 | Clerk#943 | 24786.27 |
| 1993-01-29 | Clerk#158 | 11856.13 |
| 1993-01-30 | Clerk#943 | 2712.00 |
| 1993-02-17 | Clerk#328 | 42958.47 |
| 1993-02-25 | Clerk#328 | 2703.00 |
我如何才能获得基于年份的员工的最小值?预期输出:
+--------------+------------------+-----------+--+
| orderdate | employee | minprice |
+--------------+------------------+-----------+--+
| 1992-01-13 | Clerk#943 | 7328.08 |
| 1992-02-22 | Clerk#328 | 914.01 |
| 1992-04-05 | Clerk#158 | 919.01 |
| 1993-01-30 | Clerk#943 | 2712.00 |
| 1993-01-29 | Clerk#158 | 11856.13 |
| 1993-02-25 | Clerk#328 | 2703.00 |
我目前拥有的:
SELECT o_orderdate, o_employee, min(sales) AS minprice
FROM orders
INNER JOIN sales
ON o_orderkey = s_orderkey
GROUP BY o_orderdate, o_employee
GROUPING SETS ((o_orderdate, o_employee));
查询运行但没有按年份和基于员工的过滤器。我找不到太多关于如何在 Hive 和分组集中执行此操作的文档。
感谢任何形式的帮助。
解决方案
您似乎只需要一个窗口函数,而不是聚合:
SELECT o_orderdate, o_employee, sale
FROM (SELECT o.o_orderdate, o.o_employee, s.sales,
ROW_NUMBER() OVER (PARTITION BY o.o_employee, YEAR(o.order_date) ORDER BY s.sales) as seqnum
FROM orders o JOIN
sales s
ON o.o_orderkey = s.s_orderkey
) os
WHERE seqnum = 1;
推荐阅读
- php - Laravel Eloquent ORM 过滤器与关系
- r - 在R中对数据框的列进行排序
- jenkins - 如何将参数从 Jenkins 传递到 Rundeck?
- html - 如何将活动的 CSS 类添加到菜单项?
- python - pyplot 捕捉 windows 10 关闭 windows 事件
- excel - 仅显示至少有一个指定颜色的单元格的列
- python - 投票分类器导致转换 Numpy 类型错误
- google-earth-engine - 如何在不丢失信息的情况下将图像转换为 uint8,将它们导出为 Google 地球引擎中的视频?
- angular - ng build --prod 抛出 Unexpected token: punc ()) 因为模块列表中的导入库
- d3.js - dcjs 动态缩放以适应值范围