首页 > 解决方案 > Bigquery 窗口 ID

问题描述

我需要使用窗口内第一条记录的 id 创建一个新列。对于这样的表:

uniqueId | position
-------------------
      01 | First
      02 | Last 
      03 | First
      04 | Cont 
      05 | Cont 
      06 | Cont 
      07 | Cont 
      08 | Cont 
      09 | Cont 
      10 | Last 
      11 | First
      12 | Cont 
      13 | Cont 
      14 | Cont 
      15 | Last 
      16 | First
      17 | Cont 
      18 | Last 

这是预期的结果:

uniqueId | position | result
----------------------------
      01 | First    | 01
      02 | Last     | 01
      03 | First    | 03
      04 | Cont     | 03
      05 | Cont     | 03
      06 | Cont     | 03
      07 | Cont     | 03
      08 | Cont     | 03
      09 | Cont     | 03
      10 | Last     | 03
      11 | First    | 11
      12 | Cont     | 11
      13 | Cont     | 11
      14 | Cont     | 11
      15 | Last     | 11
      16 | First    | 16
      17 | Cont     | 16
      18 | Last     | 16

我已经使用 BQ 的窗口函数尝试了几种不同的方法,但没有运气 =/

帮助我 ObiWan Kenobi,你是我唯一的希望。

标签: google-bigquery

解决方案


以下是 BigQuery 标准 SQL

#standardSQL
SELECT uniqueId, position, 
  FIRST_VALUE(uniqueId) OVER(PARTITION BY grp ORDER BY IF(position = 'First', 0, 1)) result
FROM (
  SELECT uniqueId, position, 
    COUNTIF(position = 'First') OVER(ORDER BY CAST(uniqueId AS INT64)) grp
  FROM `project.dataset.table`
)

您可以使用来自 ytour 问题的示例数据进行测试和玩,如下例所示

#standardSQL
WITH `project.dataset.table` AS (
  SELECT '01' uniqueId, 'First' position UNION ALL
  SELECT '02', 'Last ' UNION ALL
  SELECT '03', 'First' UNION ALL
  SELECT '04', 'Cont ' UNION ALL
  SELECT '05', 'Cont ' UNION ALL
  SELECT '06', 'Cont ' UNION ALL
  SELECT '07', 'Cont ' UNION ALL
  SELECT '08', 'Cont ' UNION ALL
  SELECT '09', 'Cont ' UNION ALL
  SELECT '10', 'Last ' UNION ALL
  SELECT '11', 'First' UNION ALL
  SELECT '12', 'Cont ' UNION ALL
  SELECT '13', 'Cont ' UNION ALL
  SELECT '14', 'Cont ' UNION ALL
  SELECT '15', 'Last ' UNION ALL
  SELECT '16', 'First' UNION ALL
  SELECT '17', 'Cont ' UNION ALL
  SELECT '18', 'Last ' 
)
SELECT uniqueId, position, 
  FIRST_VALUE(uniqueId) OVER(PARTITION BY grp ORDER BY IF(position = 'First', 0, 1)) result
FROM (
  SELECT uniqueId, position, 
    COUNTIF(position = 'First') OVER(ORDER BY CAST(uniqueId AS INT64)) grp
  FROM `project.dataset.table`
)
-- ORDER BY uniqueId   

结果

Row uniqueId    position    result   
1   01          First       01   
2   02          Last        01   
3   03          First       03   
4   04          Cont        03   
5   05          Cont        03   
6   06          Cont        03   
7   07          Cont        03   
8   08          Cont        03   
9   09          Cont        03   
10  10          Last        03   
11  11          First       11   
12  12          Cont        11   
13  13          Cont        11   
14  14          Cont        11   
15  15          Last        11   
16  16          First       16   
17  17          Cont        16   
18  18          Last        16    

推荐阅读