首页 > 解决方案 > BigQuery MERGE 查询不适用于空的 REPEATED 字段

问题描述

我正在尝试对MERGE具有以下架构的表运行查询:

[
   {
      "mode":"REQUIRED",
      "name":"container_ref",
      "type":"STRING"
   },
   {
      "mode":"REQUIRED",
      "name":"receive_date",
      "type":"DATE"
   },
   {
      "mode":"REQUIRED",
      "name":"vendor_id",
      "type":"INTEGER"
   },
   {
      "fields":[
         {
            "mode":"NULLABLE",
            "name":"code",
            "type":"STRING"
         },
         {
            "mode":"NULLABLE",
            "name":"quantity",
            "type":"INTEGER"
         }
      ],
      "mode":"REPEATED",
      "name":"products",
      "type":"RECORD"
   },
   {
      "mode":"REQUIRED",
      "name":"max_fill_ratio",
      "type":"FLOAT"
   }
]

我正在尝试运行以下查询:

MERGE `project.dataset.table` T
USING (
  WITH container_data AS (
    SELECT
        'yongkang_210222' AS container_ref
      , DATE('2021-02-22') AS receive_date
      , 51695 AS vendor_id
      , 0.5 AS max_fill_ratio
      , [] AS products
  )
  SELECT * FROM container_data
) S
ON T.container_ref = S.container_ref
WHEN MATCHED THEN
  UPDATE SET T.products = S.products, T.max_fill_ratio = S.max_fill_ratio
WHEN NOT MATCHED THEN
  INSERT (container_ref, receive_date, vendor_id, products, max_fill_ratio)
  VALUES(container_ref, receive_date, vendor_id, products, max_fill_ratio)
WHEN NOT MATCHED BY SOURCE AND receive_date = '2021-02-22' THEN
  DELETE

如您所见,在源数据中,products是一个空数组。我从查询中收到以下错误:

ARRAY 类型的值不能分配给 T.products,其类型为 ARRAY<STRUCT<code STRING, quantity INT64>>

products未设置为空数组的情况下运行相同的查询时,它可以正常工作。例如:

MERGE `project.dataset.table` T
USING (
  WITH container_data AS (
    SELECT
        'yongkang_210222' AS container_ref
      , DATE('2021-02-22') AS receive_date
      , 51695 AS vendor_id
      , 0.5 AS max_fill_ratio
      , [STRUCT('01757' AS code, 10 AS quantity),STRUCT('03831' AS code, 20 AS quantity)] AS products
  )
  SELECT * FROM container_data
) S
ON T.container_ref = S.container_ref
WHEN MATCHED THEN
  UPDATE SET T.products = S.products, T.max_fill_ratio = S.max_fill_ratio
WHEN NOT MATCHED THEN
  INSERT (container_ref, receive_date, vendor_id, products, max_fill_ratio)
  VALUES(container_ref, receive_date, vendor_id, products, max_fill_ratio)
WHEN NOT MATCHED BY SOURCE AND receive_date = '2021-02-22' THEN
  DELETE

我无法弄清楚为什么MERGE查询中的空数组会导致此错误。我已经尝试使用产品的空数组插入和更新表格,它工作正常。例如,此查询可以正常工作:

INSERT INTO `project.dataset.table`
(container_ref, receive_date, vendor_id, max_fill_ratio, products)
VALUES (
  'yongkang_210222'
  , DATE('2021-02-22')
  , 51695
  , 0.5
  , []
)

这开始感觉像是一个 BigQuery 错误。有人知道为什么会这样吗?

标签: google-bigquery

解决方案


我不知道为什么,但是ARRAY<STRUCT<code STRING, quantity INT64>>在数组之前添加已经解决了这个问题:

MERGE `project.dataset.table` T
USING (
  WITH container_data AS (
    SELECT
        'yongkang_210222' AS container_ref
      , DATE('2021-02-22') AS receive_date
      , 51695 AS vendor_id
      , 0.5 AS max_fill_ratio
      , ARRAY<STRUCT<code STRING, quantity INT64>> [] AS products
  )
  SELECT * FROM container_data
) S
ON T.container_ref = S.container_ref
WHEN MATCHED THEN
  UPDATE SET T.products = S.products, T.max_fill_ratio = S.max_fill_ratio
WHEN NOT MATCHED THEN
  INSERT (container_ref, receive_date, vendor_id, products, max_fill_ratio)
  VALUES(container_ref, receive_date, vendor_id, products, max_fill_ratio)
WHEN NOT MATCHED BY SOURCE AND receive_date = '2021-02-22' THEN
  DELETE

现在它对于空数组和非空数组都可以正常工作。如果有人可以提供解释,我很想了解为什么这是必要的。


推荐阅读