首页 > 解决方案 > BQ:不支持引用其他表的相关子查询 - 其中 ARRAY_AGG 不实用

问题描述

我们在 BQ 中利用ARRAYSTRUCT很多,直到由于主题错误而无法更改数组的内容。请参阅下面使用公共数据的简单示例。假设INNER JOIN由于图像丢失、故意或错误而无法正常工作。

现在,我知道通常您可以将left join数组重新定义移至FROM子句并使用ARRAY_AGG,但这并不总是可能的。

在我们的例子中,要更新的数组之外的“其他字段”是其他数组或结构——比如github-nested表。

由于您无法执行SELECT DISTINCTon STRUCTorARRAY字段,因此您最终需要 UNNEST 所有内容并使用许多 ARRAY_AGG、大量资源消耗和 OOM 风险从头开始重新创建表。这对于具有大量嵌套字段的表来说是不可能的。

SELECT
  * EXCEPT(webDetection),
  STRUCT(
    webDetection.partialMatchingImages,
    webDetection.pagesWithMatchingImages,
    webDetection.fullMatchingImages,
    ARRAY(
      SELECT AS STRUCT
        fmi.score,
        fmi.url,
        i.object_id
      FROM
        data.webDetection.fullMatchingImages fmi
      LEFT JOIN
        `bigquery-public-data.the_met.images` i
      ON
        fmi.url = i.original_image_url
      ) AS fullMatchingImages_from_met,
    webDetection.webEntities
  ) AS webDetection
FROM
  `bigquery-public-data.the_met.vision_api_data` data

知道如何避免重新聚合吗?

标签: google-bigquery

解决方案


假设从概念上讲您的查询是正确的,唯一的问题是错误correlated subqueries that reference other tables are not supported- 尝试替换下面的片段

FROM
  data.webDetection.fullMatchingImages fmi
LEFT JOIN
  `bigquery-public-data.the_met.images` i
ON
  fmi.url = i.original_image_url

FROM
  data.webDetection.fullMatchingImages fmi
CROSS JOIN
  `bigquery-public-data.the_met.images` i
WHERE
  fmi.url = i.original_image_url   

更新,添加不匹配的网址

SELECT * EXCEPT(webDetection),
  STRUCT(
    webDetection.partialMatchingImages,
    webDetection.pagesWithMatchingImages,
    webDetection.fullMatchingImages,
    ARRAY(
      SELECT AS STRUCT *      
      FROM t.webDetection.fullMatchingImages_from_met_temp
      UNION ALL
      SELECT AS STRUCT *, NULL
      FROM t.webDetection.fullMatchingImages
      WHERE NOT url IN (SELECT url FROM t.webDetection.fullMatchingImages_from_met_temp)
    ) AS fullMatchingImages_from_met,
    webDetection.webEntities
    ) AS webDetection
FROM (
  SELECT * EXCEPT(webDetection),
    STRUCT(
      webDetection.partialMatchingImages,
      webDetection.pagesWithMatchingImages,
      webDetection.fullMatchingImages,
      ARRAY(
        SELECT AS STRUCT
          fmi.score,
          fmi.url,
          i.object_id
        FROM data.webDetection.fullMatchingImages fmi
        JOIN `bigquery-public-data.the_met.images` i
        ON fmi.url = i.original_image_url
      ) AS fullMatchingImages_from_met_temp,
      webDetection.webEntities
    ) AS webDetection
  FROM `bigquery-public-data.the_met.vision_api_data` data
) t 

推荐阅读