json - 如何在 Hive 中导入复杂的 json 数据
问题描述
在输入中,我有这个文件 json 导入配置单元:
[
{
"code": "ACPBC3P",
"libelle": "Bon de commande Prime de satisfaction ACP",
"libelleCourt": "Bon de commande Prime de satisfaction ACP",
"libelleLong": "Bon de commande Prime de satisfaction ACP",
"dureeStockage": 24,
"dureeArchivage": 96,
"dureeEpuration": 120,
"dureeStockageReelle": 24,
"dureeArchivageReelle": 96,
"dureeEpurationReelle": 120,
"typologie": {
"code": "ACP",
"libelle": "ACP - Activ'projet"
},
"sousTypologie": {
"code": "ACPBC3P",
"libelle": "BC3P - Bon de commande Prime de satisfaction"
}
},
{
"code": "ACPC1",
"libelle": "C1 - Demande d'avoir",
"libelleCourt": "C1 - Demande d'avoir",
"libelleLong": "C1 - Demande d'avoir",
"dureeStockage": 36,
"dureeArchivage": 84,
"dureeEpuration": 120,
"dureeStockageReelle": 36,
"dureeArchivageReelle": 84,
"dureeEpurationReelle": 120,
"typologie": {
"code": "ACP",
"libelle": "ACP - Activ'projet"
},
"sousTypologie": {
"code": "ACPC1",
"libelle": "C1 - Demande d'avoir"
}
},
{
"code": "ACPC2",
"libelle": "C2 - Relance fournisseur",
"libelleCourt": "C2 - Relance fournisseur",
"libelleLong": "C2 - Relance fournisseur",
"dureeStockage": 36,
"dureeArchivage": 84,
"dureeEpuration": 120,
"dureeStockageReelle": 36,
"dureeArchivageReelle": 84,
"dureeEpurationReelle": 120,
"typologie": {
"code": "ACP",
"libelle": "ACP - Activ'projet"
},
我试图用这种复杂类型捕获这些信息:
ARRAY <STRUCT <`code`: STRING,` libelle`: STRING, `libelleCourt`: STRING,` libelleLong`: STRING, `storage duration`: INT, `Archive duration` INT, `dureeEpuration`: INT,` dureeStockageReelle`: INT, `dureeArchivageReelle`: INT,` dureeEpurationReelle`: INT, `typologie`: STRUCT <` code` STRING, `libelle` STRING>,` sousTypologie`: STRUCT <`code`: STRING,` libelle`: STRING>, `modeCapture`: STRUCT <` code`: STRING, `libelle`: STRING>,` master`: STRING, `codeActivite`: STRING >> but unfortunately it do not work !
ARRAY <STRUCT <`code`: STRING,` libelle`: STRING, `libelleCourt`: STRING,` libelleLong`: STRING, `storage duration`: INT, `Archive duration` INT, `dureeEpuration`: INT,` dureeStockageReelle`: INT, `dureeArchivageReelle`: INT,` dureeEpurationReelle`: INT, `typologie`: STRUCT <` code` STRING, `libelle` STRING>,` sousTypologie`: STRUCT <`code`: STRING,` libelle`: STRING>, `modeCapture`: STRUCT <` code`: STRING, `libelle`: STRING>,` master`: STRING, `codeActivite`: STRING >> but unfortunately it do not work !
解决方案
你没有提到任何关于面临的错误。一般来说,使用 JSON SerDe 时有两点需要注意。
org.apache.hadoop.hive.serde2.JsonSerDe不支持以方括号 '[' 开头的 JSON 数据
JsonSerDe 基于文本 SerDe 并且每个换行符都被视为一条新记录
有效格式:
{"world_rank": "1","country": "China","population": "1388232694","World": "0.185"},
{"world_rank": "2","country": "India","population": "1342512706","World": "0.179"},
{"world_rank": "3","country": "U.S.","population": "326474013","World": "0.043"},
{"world_rank": "4","country": "Indonesia","population": "263510146","World": "0.035"}
格式 1 无效:
[
{"world_rank": "1","country": "China","population": "1388232694","World": "0.185"},
{"world_rank": "2","country": "India","population": "1342512706","World": "0.179"},
{"world_rank": "3","country": "U.S.","population": "326474013","World": "0.043"},
{"world_rank": "4","country": "Indonesia","population": "263510146","World": "0.035"}
]
格式 2 无效:
{
"world_rank": "1",
"country": "China",
"population": "1388232694",
"World": "0.185"
},
{
"world_rank": "2",
"country": "India",
"population": "1342512706",
"World": "0.179"
},
{
"world_rank": "3",
"country": "U.S.",
"population": "326474013",
"World": "0.043"
},
{
"world_rank": "4",
"country": "Indonesia",
"population": "263510146",
"World": "0.035"
}
在将输入数据加载到 Hive 表之前,应将其预处理为以下格式
{"code":"ACPBC3P","libelle":"Bon de commande Prime de satisfaction ACP","libelleCourt":"Bon de commande Prime de satisfaction ACP","libelleLong":"Bon de commande Prime de satisfaction ACP","dureeStockage":24,"dureeArchivage":96,"dureeEpuration":120,"dureeStockageReelle":24,"dureeArchivageReelle":96,"dureeEpurationReelle":120,"typologie":{"code":"ACP","libelle":"ACP - Activ'projet"},"sousTypologie":{"code":"ACPBC3P","libelle":"BC3P - Bon de commande Prime de satisfaction"}},
{"code":"ACPC1","libelle":"C1 - Demande d'avoir","libelleCourt":"C1 - Demande d'avoir","libelleLong":"C1 - Demande d'avoir","dureeStockage":36,"dureeArchivage":84,"dureeEpuration":120,"dureeStockageReelle":36,"dureeArchivageReelle":84,"dureeEpurationReelle":120,"typologie":{"code":"ACP","libelle":"ACP - Activ'projet"},"sousTypologie":{"code":"ACPC1","libelle":"C1 - Demande d'avoir"}}
{"code":"ACPC2","libelle":"C2 - Relance fournisseur","libelleCourt":"C2 - Relance fournisseur","libelleLong":"C2 - Relance fournisseur","dureeStockage":36,"dureeArchivage":84,"dureeEpuration":120,"dureeStockageReelle":36,"dureeArchivageReelle":84,"dureeEpurationReelle":120,"typologie":{"code":"ACP","libelle":"ACP - Activ'projet"}}
DDL:
CREATE TABLE data (
code STRING,
libelle STRING,
libelleCourt STRING,
libelleLong STRING,
dureeStockage INT,
dureeArchivage INT,
dureeEpuration INT,
dureeStockageReelle INT,
dureeArchivageReelle INT,
dureeEpurationReelle INT,
typologie struct<code: STRING, libelle: STRING>,
sousTypologie struct<code: STRING, libelle: STRING>
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe'
STORED AS TEXTFILE;
选择数据的查询:
select soustypologie.code from data;
select typologie.libelle from data;
推荐阅读
- asp.net-core - 在 asp net core 中扩展 IServiceCollection
- r - R基于两列重置计数器
- php - 我不知道如何更新和替换 codeigniter 中的图像
- reactjs - 带有 CSS 模块和更少的 Storybook UI
- shell - shell 脚本中的基本回显命令在 Mac 中不起作用
- ios - Swift - 约束 UITextView
- go - 使用 go-oauth2/oauth2 库生成 JWT 刷新令牌
- python - 为什么我在编码时总是得到一个 nan 集?
- c# - C# XML 序列化 - 基于数据的自定义排序
- c# - 为什么在 WaitAsync 完成之前连接被释放?