java - How to access a column from a sub query in outer query using Spark SQL
问题描述
In spark SQL, I executed the query
select
Retailer_country,
max(sold)
from (
select
Retailer_country,
count(*) as sold
from ProductInfo
where year=2013 and Product_type='Watches'
group by Retailer_country
)
to a dataset having columns
Retailer_country
Order_method_type
Retailer_type
Product_line
Product_type
Product
Year
Quarter
Revenue
Quantity
Gross_margin
It returns an exception
org.apache.spark.sql.AnalysisException: grouping expressions sequence is empty, and '__auto_generated_subquery_name.`Retailer_country`' is not an aggregate function. Wrap '(max(__auto_generated_subquery_name.`solds`) AS `max(solds)`)' in windowing function(s) or wrap '__auto_generated_subquery_name.`Retailer_country`' in first() (or first_value) if you don't care which value you get.;;
Aggregate [Retailer_country#10, max(solds#77L) AS max(solds)#80L]
+- SubqueryAlias `__auto_generated_subquery_name`
+- Aggregate [Retailer_country#10], [Retailer_country#10, count(1) AS solds#77L]
+- Filter ((cast(year#16 as int) = 2013) && (Product_type#14 = Watches))
+- SubqueryAlias `productinfo`
+- Relation[Retailer_country#10,Order_method_type#11,Retailer_type#12,Product_line#13,Product_type#14,Product#15,Year#16,Quarter#17,Revenue#18,Quantity#19,Gross_margin#20] csv
While I executed the same query in an online compiler with the same table structure, It returns
USA|4
.
Sample Data:
USA|Fax|Outdoors Shop|Camping Equipment|Cooking Gear|TrailChef Deluxe Cook Set|2012|Q1 2012|59628.66|489|0.34754797
USA|Fax|Outdoors Shop|Camping Equipment|Cooking Gear|TrailChef Deluxe Cook Set|2013|Q1 2012|59628.66|489|0.34754797
USA|Fax|Outdoors Shop|Camping Equipment|watch|TrailChef Deluxe Cook Set|2013|Q1 2012|59628.66|489|0.34754797
USA|Fax|Outdoors Shop|Camping Equipment|watch|TrailChef Deluxe Cook Set|2013|Q1 2012|59628.66|489|0.34754797
USA|Fax|Outdoors Shop|Camping Equipment|watch|TrailChef Deluxe Cook Set|2013|Q1 2012|59628.66|489|0.34754797
UK|Fax|Outdoors Shop|Camping Equipment|watch|TrailChef Deluxe Cook Set|2013|Q1 2012|59628.66|489|0.34754797
UK|Fax|Outdoors Shop|Camping Equipment|watch|TrailChef Deluxe Cook Set|2013|Q1 2012|59628.66|489|0.34754797
USA|Fax|Outdoors Shop|Camping Equipment|watch|TrailChef Deluxe Cook Set|2013|Q1 2012|59628.66|489|0.34754797
UK|Fax|Outdoors Shop|Camping Equipment|Cooking Gear|TrailChef Deluxe Cook Set|2013|Q1 2012|59628.66|489|0.34754797
UK|Fax|Outdoors Shop|Camping Equipment|Cooking Gear|TrailChef Deluxe Cook Set|2012|Q1 2012|59628.66|489|0.34754797
How the result differs and how to display the Retailer_country
in spark.
Any help to solve this.
解决方案
you have two aggreages. one for sub-query other for main query. the format of your query must be like below
select
Retailer_country,
max(sold)
from (
select
Retailer_country,
count(*) as sold
from ProductInfo
where year=2013 and Product_type='Watches'
group by Retailer_country)
group by Retailer_country
however, if we take a deeper look to your query, your subquery returns distinct Retailer_country because of group by aggregate. hence, you do not need to use outer max(sold) aggregate. As a result, your final query is actually this one:
select
Retailer_country,
count(*) as sold
from ProductInfo
where year=2013 and Product_type='Watches'
Group by Retailer_country
Edit: According to provided comment
select
Retailer_country,
count(*) as sold
from ProductInfo
where year=2013 and Product_type='Watches'
group by Retailer_country
Order by sold desc
limit 1
推荐阅读
- python-3.6 - 如何在vscode中为python unittest配置单元测试
- javascript - scrollToIndex() 不是函数
- scala - spark 的 withColumn() 函数是否检查数据集或数据帧的每一行
- navigation - A* 或 Dijkstra 对行人导航系统更有效吗?
- c - 为什么数组元素中的元素会移位?
- vue.js - 如何在 vue-router 中为嵌套路由使用相同的组件
- mysql - 选择小于上一个的值
- javascript - 在没有 for 或 while 循环的情况下获取字符串中每个重叠出现的索引
- c++ - 使用删除时堆损坏
- java - Android为前置摄像头提供错误格式