首页 > 解决方案 > 取消嵌套存储在列中的 JSON 字符串 [BigQuery]

问题描述

我有一个表,其中一列包含原始 JSON 字符串,如下所示: 在此处输入图像描述

存储在 order_lines 中的示例 JSON:

{
   "STR_BLK_002":{
      "amount":167,
      "type":"part spare",
      "total_discount":0,
      "color":"Black",
      "is_out_of_stock":false,
      "variable_fields":{
         "Size":"XL",
         "trueColor":"Black"
      },
      "category_id":"44356721",
      "status_list":[
         {
            "id":1,
            "time":"2021-04-01T15:01:54.746Z",
            "status":"ORDER PLACED"
         },
         {
            "id":2,
            "time":"2021-04-02T10:31:00.397Z",
            "status":"PACKED"
         },
         {
            "id":3,
            "time":"2021-04-04T10:31:01.719Z",
            "status":"SHIPPED"
         },
         {
            "id":3,
            "time":"2021-04-04T18:12:06.896Z",
            "status":"SHIPPED"
         }
      ],
      "product_id":270,
      "price_per_quantity":167,
      "quantity":1,
      "maximum_quantity":10,
      "variant_name":"Helmet strap",
      "current_status":30,
      "estimated_delivery":"09 Apr 2021",
      "total_before_discount":167,
      "delivery_statuses":[
         {
            "time":"2021-04-01T15:10:13.594Z",
            "status":"FULFILLABLE"
         },
         {
            "time":"2021-04-02T10:31:00.397Z",
            "status":"PACKED"
         },
         {
            "time":"2021-04-03T10:31:01.197Z",
            "status":"READY_TO_SHIP"
         },
         {
            "time":"2021-04-04T10:31:01.719Z",
            "status":"DISPATCHED"
         },
         {
            "time":"2021-04-04T18:12:06.896Z",
            "status":"SHIPPED"
         }
      ],
      "sku_code":"STR_BLK_002"
   }
}

我想取消嵌套这个字符串,以便可以单独访问键值对。此外,sku_code(上面共享的示例中的“STR_BLK_002”)在任何其他列中均不可用,并且该字符串可以包含多个单个 sku,因此如果有 2 个对应于订单的 sku,则 JSON 字符串将为:

{
   "STR_BLK_002":{
      "amount":167,
      "type":"part spare",
      "total_discount":0,
      "color":"Black",
      "is_out_of_stock":false,
      "variable_fields":{
         "Size":"XL",
         "trueColor":"Black"
      },
      "category_id":"44356721",
      "status_list":[
         {
            "id":1,
            "time":"2021-04-01T15:01:54.746Z",
            "status":"ORDER PLACED"
         },
         {
            "id":2,
            "time":"2021-04-02T10:31:00.397Z",
            "status":"PACKED"
         },
         {
            "id":3,
            "time":"2021-04-04T10:31:01.719Z",
            "status":"SHIPPED"
         },
         {
            "id":3,
            "time":"2021-04-04T18:12:06.896Z",
            "status":"SHIPPED"
         }
      ],
      "product_id":270,
      "price_per_quantity":167,
      "quantity":1,
      "maximum_quantity":10,
      "variant_name":"Helmet strap",
      "current_status":3,
      "estimated_delivery":"09 Apr 2021",
      "total_before_discount":167,
      "delivery_statuses":[
         {
            "time":"2021-04-01T15:10:13.594Z",
            "status":"FULFILLABLE"
         },
         {
            "time":"2021-04-02T10:31:00.397Z",
            "status":"PACKED"
         },
         {
            "time":"2021-04-03T10:31:01.197Z",
            "status":"READY_TO_SHIP"
         },
         {
            "time":"2021-04-04T10:31:01.719Z",
            "status":"DISPATCHED"
         },
         {
            "time":"2021-04-04T18:12:06.896Z",
            "status":"SHIPPED"
         }
      ],
      "sku_code":"STR_BLK_002"
   },
   "STR_BLK_008":{
      "amount":590,
      "type":"accessory",
      "total_discount":0,
      "color":"blue",
      "is_out_of_stock":false,
      "variable_fields":{
         "Size":"XL",
         "trueColor":"prussian blue"
      },
      "category_id":"65577970",
      "status_list":[
         {
            "id":1,
            "time":"2021-04-06T15:01:54.746Z",
            "status":"ORDER PLACED"
         },
         {
            "id":2,
            "time":"2021-04-07T10:31:00.397Z",
            "status":"PACKED"
         },
         {
            "id":3,
            "time":"2021-04-07T10:31:01.719Z",
            "status":"SHIPPED"
         },
         {
            "id":3,
            "time":"2021-04-08T18:12:06.896Z",
            "status":"SHIPPED"
         }
      ],
      "product_id":276,
      "price_per_quantity":590,
      "quantity":1,
      "maximum_quantity":5,
      "variant_name":"Car Perfume",
      "current_status":3,
      "estimated_delivery":"09 Apr 2021",
      "total_before_discount":590,
      "delivery_statuses":[
         {
            "time":"2021-04-06T15:10:13.594Z",
            "status":"FULFILLABLE"
         },
         {
            "time":"2021-04-07T10:31:00.397Z",
            "status":"PACKED"
         },
         {
            "time":"2021-04-07T10:31:01.197Z",
            "status":"READY_TO_SHIP"
         },
         {
            "time":"2021-04-08T10:31:01.719Z",
            "status":"DISPATCHED"
         },
         {
            "time":"2021-04-10T18:12:06.896Z",
            "status":"SHIPPED"
         }
      ],
      "sku_code":"STR_BLK_008"
   }
}

我想将此信息分成单独的列,以便我可以获取每个 SKU 的相应值。

标签: jsongoogle-bigqueryunnest

解决方案


所以基本上我认为你想要做的是首先将你的列转换为一个结构数组,而不是这样:

{
   "STR_BLK_002": {...},
   "STR_BLK_003": {...}
}

你有这样的事情:

[
  {
      "amount":167,
      "type":"part spare",
      "total_discount":0,
       ...
   },
  { 
      "amount":590,
      "type":"accessory",
      "total_discount":0,
      ...
  }
]

使用该格式的数据,您可以利用UNNEST将每个条目放入自己的行中,然后使用JSON 函数将字段提取到自己的列中,例如JSON_EXTRACT_SCALAR

为了做到这一点,我构建了一个Javascript UDF,它在对象中查找键,然后遍历每个键以创建一个结构数组。

CREATE TEMP FUNCTION format_json(str STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS r"""
  var obj = JSON.parse(str);
  var keys = Object.keys(obj);
  var arr = [];
  for (i = 0; i < keys.length; i++) {
    arr.push(JSON.stringify(obj[keys[i]]));
  }
  return arr;
""";

SELECT 
  JSON_EXTRACT_SCALAR(formatted_json,'$.amount') as amount
  ,JSON_EXTRACT_SCALAR(formatted_json,'$.type') as type
  ,JSON_EXTRACT_SCALAR(formatted_json,'$.total_discount') as total_discount
  ,JSON_EXTRACT_SCALAR(formatted_json,'$.color') as color
  ,JSON_EXTRACT_SCALAR(formatted_json,'$.is_out_of_stock') as is_out_of_stock
  ,JSON_EXTRACT_SCALAR(formatted_json,'$.sku_code') as sku_code
from
testing.json_test
left join unnest(format_json(order_lines)) as formatted_json

结果是:

在此处输入图像描述


推荐阅读