首页 > 解决方案 > Want to parse string BLOB as JSON, modify few fields and parse it back to string BIGQUERY

问题描述

So I have a column which contains JSONs as string BLOBs. For example,

{
    "a": "a-1",
    "b": "b-1",
    "c":
    {
        "c-1": "c-1-1",
        "c-2": "c-2-1"
    },
    "d":
    [
        {
            "k": "v1"
        },
        {
            "k": "v2"
        },
        {
            "k": "v3"
        }
    ]
}

Now my use case is to parse the key b, hash the value of key b, assign it back to the key b and store it back as a string in Bigquery.

I initially tried to do a lazy approach where I am only extracting the key b using the json_extract_scalar function in Bigquery and for other keys (like c and d - which I dont want to modify), I used json_extract function. Then I converted back to string after doing hashing the key b. Here is the query -

SELECT
  TO_JSON_STRING( 
      STRUCT( 
          json_EXTRACT(COL_NAME, "$.a") AS a,
        MD5(json_extract_scalar(_airbyte_data,"$.b")) AS b,
          json_EXTRACT(COL_NAME,"$.c") AS c,
          json_EXTRACT(COL_NAME,"$.d") AS d ) )
FROM
  dataset.TABLE

But the issue with this query is the JSON objects are getting converted to string and double quotes getting escaped due to TO_JSON_STRING (I tried using CAST AS STRING on top of STRUCT but it isn't supported). For example, the output row for this query looks like this:

{
    "a": "a-1",
    "b": "b-1",
    "c":
    "{
        \"c-1\": \"c-1-1\",
        \"c-2\": \"c-2-1\"
    }",
    "d":
    "[
        {
            \"k\": \"v1\"
        },
        {
            \"k\": \"v2\"
        },
        {
            \"k\": \"v3\"
        }
    ]"
}

I can achieve the required output if I use JSON_EXTRACT and JSON_EXTRACT_SCALAR functions on every key (and on every nested keys) but this approach isn't scalable as I have close to 200 keys and many of them are nested 2-3 levels deep. Can anyone suggest a better approach of achieving this? TIA

标签: sqljsongoogle-bigquery

解决方案


这应该工作

declare _json string default """{
    "a": "a-1",
    "b": "b-1",
    "c":
    {
        "c-1": "c-1-1",
        "c-2": "c-2-1"
    },
    "d":
    [
        {
            "k": "v1"
        },
        {
            "k": "v2"
        },
        {
            "k": "v3"
        }
    ]
}""";

SELECT regexp_replace(_json, r'"b": "(\w+)-(\w+)"',concat('"b":',TO_JSON_STRING( MD5(json_extract_scalar(_json,"$.b")))))

输出

{
    "a": "a-1",
    "b":"sdENsgFsL4PBOyX8sXDN6w==",
    "c":
    {
        "c-1": "c-1-1",
        "c-2": "c-2-1"
    },
    "d":
    [
        {
            "k": "v1"
        },
        {
            "k": "v2"
        },
        {
            "k": "v3"
        }
    ]
}

如果您需要特定的正则表达式,请指定 b 值的示例。


推荐阅读