首页 > 解决方案 > 由于 coorelated 子查询,Bigquery 作业从今天开始突然开始失败

问题描述

由于以下错误,我的 Bigquery 作业直到昨天才开始执行失败

错误:- 不支持引用其他表的相关子查询,除非它们可以去相关,例如通过将它们转换为有效的 JOIN

询问:-

with result as (
  select
    *
  from
  (
    select * from `project.dataset_stage.=non_split_daily_temp`
    union all
    select * from `project.dataset_stage.split_daily_temp`
  )
)
select
  *
from
  result final
where
  not (
    exists
    (
      select
        1
      from
        `project.dataset.bqt_sls_cust_xref` target
      where
        final.sls_dte = target.sls_dte and
        final.rgs_id = target.rgs_id and
    ) and
    unlinked = 'Y' and
   cardmatched = 'Y'
  }

有人可以帮助我吗,我想知道突然中断的原因以及如何永久解决此问题。

标签: google-bigquery

解决方案


感谢您的建议。

我们找出了问题的原因,下面是原因

当一个人写相关的子查询时

select T2.col, (select count(*) from T1 where T1.col = T2.col) from T2

从技术上讲,SQL 文本意味着需要为 T2 中的每一行重新执行子查询。如果 T2 有十亿行,那么我们需要扫描 T1 十亿次。这将花费很长时间,并且查询永远不会完成。

如果实现如下,执行查询的成本从 O(size T1 * size T2) 下降到 O(size T1 + size T2)

select any_value(t.col), count(*) from
t left join T1 on T1.col = t.col
group by t.primary_key````

BigQuery errors out if it can't find a way to optimize correlated subquery into linear cost O(size T1 + size T2).

We have plenty of patterns that we recognize and we rewrite for correlated subqueries but apparently new view definition made subquery too complex and query optimizer was unable find a way to run it in linear complexity algorithm.

Probably google will fix the issue by identifying the better algorithm. 


推荐阅读