首页 > 解决方案 > 在 BigQuery 中执行部分行以获取最后的数据并按 id 排序

问题描述

我想得到最后一个id和他们的rank(基于order by date_update asc然后order by再由 id desc)并显示idrankid。我做如下查询:

 SELECT id as data,
                    RANK() OVER (ORDER BY date_update) AS rank
                    FROM `test.sample`
                    ORDER BY id DESC
                    LIMIT 1 

它适用于其他表,但不适用于一些包含大数据的表并引起注意:

Resources exceeded during query execution: The query could not be executed in the allotted memory.

我已阅读故障排除错误大查询 并尝试删除ORDER BY但仍然无法运行,我该怎么办?

样本数据:

id     date_update
22     2019-10-04
14     2019-10-01
24     2019-10-03
13     2019-10-02

过程 :

  1. Rank() Over (Order by date_update)

id date_update 排名

14   2019-10-01    1
13   2019-10-02    2
24   2019-10-03    3
22   2019-10-04    4
  1. order by id desc基于以上

    id    date_update  rank                                            
    24   2019-10-03   3
    22   2019-10-04   4
    14   2019-10-01   1
    13   2019-10-02   2
    

这是预期的结果:

id        rank                                                         
24         3

标签: google-bigquerysql-order-bypartitioning

解决方案


您可以使用下面的查询。它基本上找到具有最大 ID(最新 ID)的行,然后使用最大 ID 行的 date_value 作为过滤器再次查询源表。

WITH 
`test.sample` AS
(
  select 22 AS id, DATE('2019-10-04') as date_update union all
  select 14 AS id, DATE('2019-10-01') as date_update union all
  select 24 AS id, DATE('2019-10-03') as date_update union all
  select 13 AS id, DATE('2019-10-02') as date_update 
),
max_id_row AS
(
    SELECT ARRAY_AGG(STRUCT(id, date_update) ORDER BY id DESC LIMIT 1)[OFFSET(0)] vals
    FROM `test.sample`
)
SELECT m.vals.id, m.vals.date_update, COUNT(*) as rank
FROM `test.sample` as t
JOIN max_id_row as m
    ON t.date_update <= m.vals.date_update
GROUP BY 1,2

推荐阅读