首页 > 解决方案 > 在 BigQuery 中泛化 Top N 查询

问题描述

这是一个后续问题,用于概括BigQuery 中跨多个列的前 N ​​个结果的情况。现在让我们获取以下数据:

 year   genre         studio            title       revenue
2014    fantasy       fox               avatar      10
2015    fantasy       fox               avatar      12
2016    fantasy       fox               avatar      12
2015    action        sony              spider-man  10
2015    romance       paramount         love letter 15
2015    action        sony              spider-man  10
2015    action        sony              spider-man  10
2015    action        disney            toy story   10
2015    action        sony              edgar       4
2015    action        sony              thomas      1
2015    fantasy       fox               avatar      2

我想得到以下结果来构建树结构:

Past 2 years, Top 2 genres (Alphabetically), Top 2 studios (by Count), Top 2 titles by SUM Revenue DESC

所以我们会得到类似的东西:

在此处输入图像描述

从概念上讲,我希望实现的查询是这样的:

SELECT year, genre, studio, title, SUM(revenue)
FROM titles
GROUP BY year, genre, studio, title

// in pseudocode
ORDER BY
    (year DESC) LIMIT 2,
    (genre ASC) LIMIT 10,
    (COUNT(studio) DESC) LIMIT 2,
    (SUM(revenue) DESC) LIMIT 2

执行上述操作的最佳方法是什么,这更像是在 BQ 中构建树结构的概括。

标签: sqlgoogle-bigquerypivotpivot-table

解决方案


过滤子查询中前 2 年的行,同时按工作室查找电影计数和按标题查找收入总和。

然后按流派、工作室、收入和过滤器查找前 2 名的排名。

select year, genre, studio, title, revenue 
from (
    select year, genre, studio, title, revenue,
        dense_rank() over (partition by year order by genre) as genre_rank,
        dense_rank() over (partition by year, genre order by count_by_studio desc) as studio_rank,
        dense_rank() over (partition by year, genre, studio order by revenue_by_title desc) as title_rank
    from (
        select year,
            genre,
            studio,
            title,
            revenue,
            dense_rank() over (order by year desc) as year_rank,
            count(*) over (partition by year, genre, studio) as count_by_studio,
            sum(revenue) over (partition by year, genre, studio, title) as revenue_by_title
        from titles
    ) where year_rank <= 2
) where genre_rank <= 2
and studio_rank <= 2
and title_rank <= 2;

推荐阅读