首页 > 解决方案 > 将 JSON 加载到 BigQuery:字段有时是数组,有时是字符串

问题描述

我正在尝试将 JSON 数据加载到 BigQuery。我的数据导致问题的摘录如下所示:

 [{"Value":"123","Code":"A"},{"Value":"000","Code":"B"}]
 {"Value":"456","Code":"A"}
 [{"Value":"123","Code":"A"},{"Value":"789","Code":"C"},{"Value":"000","Code":"B"}]
 {"Value":"Z","Code":"A"}

我已将此字段的架构定义为:

  {
    "fields": [
      {
        "mode": "NULLABLE",
        "name": "Code",
        "type": "STRING"
      },
      {
        "mode": "NULLABLE",
        "name": "Value",
        "type": "STRING"
      }
    ],
    "mode": "REPEATED",
    "name": "Properties",
    "type": "RECORD"
  }

但是我无法成功地将字符串和数组值提取到一个重复的字段中。此 SQL 将成功提取字符串值:

JSON_EXTRACT_SCALAR(json_string,'$.Properties.Code') as Code,
JSON_EXTRACT_SCALAR(json_string,'$.Properties.Value') as Value

此 SQL 将成功提取数组值:

  ARRAY(
    SELECT
      STRUCT(
        JSON_EXTRACT_SCALAR(Properties_Array,'$.Code') AS Code,
        JSON_EXTRACT_SCALAR(Properties_Array,'$.Value') AS Value
      )
    FROM UNNEST(JSON_EXTRACT_ARRAY(json_string,'$.Properties')) Properties_Array)
  AS Properties

我试图找到一种方法让 BigQuery 将此字符串作为一个元素数组读取,而不是对数据进行预处理。这在#StandardSQL 中可行吗?

标签: arraysgoogle-cloud-platformstructgoogle-bigquery

解决方案


以下示例适用于 BigQuery 标准 SQL

#standardSQL
WITH `project.dataset.table` as (
  SELECT '{"Properties":[{"Value":"123","Code":"A"},{"Value":"000","Code":"B"}]}' json_string UNION ALL
  SELECT '{"Properties":{"Value":"456","Code":"A"}}' UNION ALL
  SELECT '{"Properties":[{"Value":"123","Code":"A"},{"Value":"789","Code":"C"},{"Value":"000","Code":"B"}]}' UNION ALL
  SELECT '{"Properties": {"Value":"Z","Code":"A"}}'  
)
SELECT json_string, 
  ARRAY(
    SELECT STRUCT(
        JSON_EXTRACT_SCALAR(Properties,'$.Code') AS Code,
        JSON_EXTRACT_SCALAR(Properties,'$.Value') AS Value
      )
    FROM UNNEST(IFNULL(
      JSON_EXTRACT_ARRAY(json_string,'$.Properties'), 
      [JSON_EXTRACT(json_string,'$.Properties')])) Properties
  ) AS Properties  
FROM `project.dataset.table`      

带输出

在此处输入图像描述


推荐阅读