json - 使用数据集从 Hive 中的字符串中提取 json 字段
问题描述
我正在尝试一个非常基本的配置单元查询。我正在尝试从数据集中提取 json 字段,但我总是得到
\N
对于 json 字段,但是 some_string 没问题
这是我的查询:
WITH dataset AS (
SELECT
CAST(
'{ "traceId": "abc", "additionalData": "{\"Star Rating\":\"3\"}", "locale": "en_US", "content": { "contentType": "PB", "content": "T S", "bP": { "mD": { "S R": "3" }, "cType": "T_S", "sType": "unknown-s", "bTimestamp": 0, "title": "T S" } }
}' AS STRING) AS some_string
)
SELECT some_string, get_json_object(dataset.some_string, '$.traceId') FROM dataset
问题:如何在此处获取 json 字段?
解决方案
问题在于反斜杠。单个反斜杠被视为 " 的转义字符并被 Hive 删除:
hive> select '\"';
OK
"
Time taken: 0.069 seconds, Fetched: 1 row(s)
当您有两个反斜杠时,Hive 会删除一个:
hive> select '\\"';
OK
\"
Time taken: 0.061 seconds, Fetched: 1 row(s)
使用两个反斜杠可以正常工作:
WITH dataset AS (
SELECT
CAST(
'{ "traceId": "abc", "additionalData": "{\\"Star Rating\\":\\"3\\"}", "locale": "en_US", "content": { "contentType": "PB", "content": "T S", "bP": { "mD": { "S R": "3" }, "cType": "T_S", "sType": "unknown-s", "bTimestamp": 0, "title": "T S" } }
}' AS STRING) AS some_string
)
SELECT some_string, get_json_object(dataset.some_string, '$.traceId') FROM dataset;
OK
{ "traceId": "abc", "additionalData": "{\"Star Rating\":\"3\"}", "locale": "en_US", "content": { "contentType": "PB", "content": "T S", "bP": { "mD": { "S R": "3" }, "cType": "T_S", "sType": "unknown-s", "bTimestamp": 0, "title": "T S" } }
} abc
Time taken: 0.788 seconds, Fetched: 1 row(s)
您还可以在附加数据中轻松删除 { 之前和 } 之后的双引号:
WITH dataset AS (
SELECT
regexp_replace(regexp_replace(
'{ "traceId": "abc", "additionalData": "{\"Star Rating\":\"3\"}", "locale": "en_US", "content": { "contentType": "PB", "content": "T S", "bP": { "mD": { "S R": "3" }, "cType": "T_S", "sType": "unknown-s", "bTimestamp": 0, "title": "T S" } }
}' ,'\\"\\{','\\{') ,'\\}\\"','\\}' )AS some_string
)
SELECT some_string, get_json_object(dataset.some_string, '$.traceId') FROM dataset;
回报:
OK
{ "traceId": "abc", "additionalData": {"Star Rating":"3"}, "locale": "en_US", "content": { "contentType": "PB", "content": "T S", "bP": { "mD": { "S R": "3" }, "cType": "T_S", "sType": "unknown-s", "bTimestamp": 0, "title": "T S" } }
} abc
Time taken: 7.035 seconds, Fetched: 1 row(s)
推荐阅读
- php - 如何直接从 url 将文件上传到 S3 存储桶
- javascript - 在javascript中运行动画函数后如何添加函数/元素动作?
- .net - 如何设置 Visual Studio Pack 默认发布而不是像发布按钮这样的当前模式
- python - 如何在 Android Studio 项目中添加 Python Selenium 脚本
- spring-boot - 在 Spring Boot 的 LDAP 身份验证中使用 sAMAccountName 而不是 userPrincipalName
- java - 二元运算符 '<=' , '+=' 的错误操作数类型
- jquery - 如何在 jQuery select2 远程选择框中设置默认选择值的排序?
- php - 如何确定一行中是否有 X 个时隙?
- sql - 比较 PostgreSQL 中的两列仅显示最高值
- c# - 我无法让 OnMouseOver() 注意到我的鼠标悬停在 c# (Unity) 中的按钮上