首页 > 解决方案 > 连续几十年按十年排名前 N(在 SQL Server 中)

问题描述

我正在尝试为Top 5最近的 6 个十年中的每一个获取按十年分组的(即最常见的)文档标题的排名列表。

文档标题不唯一。在任何给定的日历年中,可能有数十甚至数百个具有相同标题的文档。

以下查询是我所能做到的。它给了我前 5 名,但仅限于“所有其他人”十年。

如何修改查询以获得其他十年的前 5 个标题?

SELECT
    Top 5 documentTitle AS 'Title',
    RANK() OVER (PARTITION BY calendarYear ORDER BY COUNT(documentTitle) DESC) AS Rank,
    COUNT(tblDocumentFact.inventionTitleEnID) AS 'Number of Occurrences',
    CASE
        WHEN calendarYear BETWEEN 2010 AND 2019 THEN '2010 - 2019'
        WHEN calendarYear BETWEEN 2000 AND 2009 THEN '2000 - 2009'
        WHEN calendarYear BETWEEN 1990 AND 1999 THEN '1990 - 1999'
        WHEN calendarYear BETWEEN 1980 AND 1989 THEN '1980 - 1989'
        WHEN calendarYear BETWEEN 1970 AND 1979 THEN '1970 - 1979'
        WHEN calendarYear BETWEEN 1960 AND 1969 THEN '1960 - 1969'
        ELSE 'all others'
    END AS Decade
FROM        tbldocumentTitleDimension
INNER JOIN  tblDocumentFact     ON  tbldocumentTitleDimension.documentTitleID   = tblDocumentFact.documentTitleID
INNER JOIN  tblDateDimension    ON  tblDocumentFact.publicationDateID           = tblDateDimension.dateID
GROUP BY    documentTitle,
            calendarYear
ORDER BY    [Number of Occurrences] DESC

标签: sqlsql-servergroup-bygreatest-n-per-groupwindow-functions

解决方案


如果我没记错的话,你想每十年前 5 名。如果是这样的话:

  • 您需要group by十年而不是日历年才能获得正确的计数;在子查询中计算十年更容易,因此您不必重复case表达式

  • 排名应该按decade分区而不是每年计算

  • 然后,您可以使用该列在外部查询中进行过滤

考虑:

select *
from (
    select
        dtd.documenttitle as title,
        rank() over (partition by dd.decade order by count(*) desc) as rnk,
        count(*) as number_of_occurrences,
        dd.decade
    from tbldocumentTitleDimension dtd
    inner join tblDocumentFact df on dtd.documenttitleid   = df.documenttitleid
    inner join  (
        select 
            dateid,
            case
                when calendarYear between 2010 and 2019 then '2010 - 2019'
                when calendarYear between 2000 and 2009 then '2000 - 2009'
                when calendarYear between 1990 and 1999 then '1990 - 1999'
                when calendarYear between 1980 and 1989 then '1980 - 1989'
                when calendarYear between 1970 and 1979 then '1970 - 1979'
                when calendarYear between 1960 and 1969 then '1960 - 1969'
                else 'all others'
            end AS decade
            from tblDateDimension
    ) dd on df.publicationdateid  = dd.dateid
    group by dtd.documenttitle, dd.decade
) t
where rnk <= 5
order by decade, number_of_occurrences desc

旁注:

  • 不要使用单引号作为标识符(尽管 SQL Server 允许这样做,单引号应该保留给 litteral stings,如 SQL 标准中所定义) - 更好的是,您可以使用不需要引用的标识符

  • 在多表查询中,始终使用它们所属的表来限定所有列名;我在这里做了一些假设

  • 除非您null在列documentTitle中有不想计算在内的值,否则可以使用count(*)而不是count(documentTitle)- 这是直接的,更有效的


推荐阅读