首页 > 解决方案 > 在创建分区时在框架子句窗口中添加分组

问题描述

以 Google 上托管的数据集(MBL 数据)为例,这就是我要完成的工作 - 获取给定场地最近 3 周的得分。

我的聚合数据集看起来像这样没有罢工_3wk 列 - 在此处输入图像描述

对于strikes_3wk 列的逻辑是按venueName 对聚合数据集进行分区,按YearWeek 列排序,然后获取最近3 周的聚合罢工数据。

这是我到目前为止写的查询。我看到窗口函数是我需要修改逻辑的地方。那么,有没有办法在窗口函数中添加分组?有没有其他方法可以做到这一点?

在图像中,我添加了一个新列“预期”,显示两周的值。

select inr.*
       ,sum(inr.strikes) over (Venue_Week rows between current row and 2 following) as strikes_3wk
from
(
    select seasonType
        ,gameStatus
        ,homeTeamName
        ,awayTeamName
        ,venueName
        ,CAST(
        CONCAT(
            CAST(EXTRACT(YEAR FROM createdAt) as string)
            ,CAST(EXTRACT(WEEK(Monday) FROM createdAt) as string)
            ) as INT64)
            as YearWeek
        ,sum(homeFinalRuns) as homeFinalRuns
        ,sum(strikes) as strikes
    from  `bigquery-public-data.baseball.games_wide`
    where   createdAt is not null
    group by seasonType
        ,gameStatus
        ,homeTeamName
        ,awayTeamName
        ,venueName
        ,YearWeek
)inr
window Venue_Week as (
    partition by inr.venueName
    order by inr.YearWeek desc
)

标签: sqlgoogle-bigquerywindow-functions

解决方案


所以你正在寻找每个场地的罢工,不管是谁做的,对吧?

可能是这样的:

SELECT INR.*, STATS.strikes_3wk 
FROM `bigquery-public-data.baseball.games_wide` INR
  LEFT JOIN (
    SELECT venueName, SUM(strikes) as strikes_3wk 
    FROM `bigquery-public-data.baseball.games_wide` INR2
    WHERE YearWeek IN (
      SELECT TOP 3 YearWeek 
      FROM `bigquery-public-data.baseball.games_wide` 
      WHERE venueName = INR2.venueName
      ORDER BY YearWeek DESC
    )
    GROUP BY venueName
  ) STATS 
    ON INR.venueName = STATS.venueName

推荐阅读