sql - 在 BigQuery 中泛化 Top N 查询
问题描述
这是一个后续问题,用于概括BigQuery 中跨多个列的前 N 个结果的情况。现在让我们获取以下数据:
year genre studio title revenue
2014 fantasy fox avatar 10
2015 fantasy fox avatar 12
2016 fantasy fox avatar 12
2015 action sony spider-man 10
2015 romance paramount love letter 15
2015 action sony spider-man 10
2015 action sony spider-man 10
2015 action disney toy story 10
2015 action sony edgar 4
2015 action sony thomas 1
2015 fantasy fox avatar 2
我想得到以下结果来构建树结构:
Past 2 years, Top 2 genres (Alphabetically), Top 2 studios (by Count), Top 2 titles by SUM Revenue DESC
所以我们会得到类似的东西:
从概念上讲,我希望实现的查询是这样的:
SELECT year, genre, studio, title, SUM(revenue)
FROM titles
GROUP BY year, genre, studio, title
// in pseudocode
ORDER BY
(year DESC) LIMIT 2,
(genre ASC) LIMIT 10,
(COUNT(studio) DESC) LIMIT 2,
(SUM(revenue) DESC) LIMIT 2
执行上述操作的最佳方法是什么,这更像是在 BQ 中构建树结构的概括。
解决方案
过滤子查询中前 2 年的行,同时按工作室查找电影计数和按标题查找收入总和。
然后按流派、工作室、收入和过滤器查找前 2 名的排名。
select year, genre, studio, title, revenue
from (
select year, genre, studio, title, revenue,
dense_rank() over (partition by year order by genre) as genre_rank,
dense_rank() over (partition by year, genre order by count_by_studio desc) as studio_rank,
dense_rank() over (partition by year, genre, studio order by revenue_by_title desc) as title_rank
from (
select year,
genre,
studio,
title,
revenue,
dense_rank() over (order by year desc) as year_rank,
count(*) over (partition by year, genre, studio) as count_by_studio,
sum(revenue) over (partition by year, genre, studio, title) as revenue_by_title
from titles
) where year_rank <= 2
) where genre_rank <= 2
and studio_rank <= 2
and title_rank <= 2;
推荐阅读
- python - 启动 PyFladesk 应用程序在 Windows 10 上出现 ucrtbase.dll 错误
- java - 如何滚动到“手表部分”Appium
- nginx - ClusterIP:无和失败的 pod
- java - 在二十一点游戏中制造一段时间的麻烦
- c# - 具有通用 TypeOf 的过滤器模式
- postgresql - CTE 中是否需要 SELECT FOR UPDATE 进行更新?
- php - 雄辩的关系和多维收集拒绝
- angular - 材质仪表板角度主题
- java - RabbitMQ 不等价 arg 'x-max-length-bytes' 异常
- javascript - 使用打字稿创建日期 + 07/12/9999 时出现错误