hive - 如何将 hive 的数据类型从字符串更改为多维数组以正确拆分列
问题描述
我有一个蜂巢表,如:
+----------+----------------------------------------------------------------------------------------------+
| DEVTYPE | POINTS |
+----------+----------------------------------------------------------------------------------------------+
| Array | [['1538006400', '629928.0625'], ['1538611200', '629928.0625'], ['1539216000', '629928.0625']]|
| Array | [['1541030400', '629928.0625'], ['1541635200', '629928.0625'], ['1542240000', '629928.0625']]|
| Array | [['1544054400', '629928.0625'], ['1544659200', '629928.0625'], ['1545264000', '629928.125']] |
| Array | [['1547078400', '629928.0625'], ['1547683200', '629928.0625'], ['1548288000', '629928.0625']]|
| Array | [['1550102400', '629928.0625'], ['1550707200', '629928.125'], ['1551312000', '629928.0625']] |
+----------+----------------------------------------------------------------------------------------------+
但是当我描述该表时,我发现 POINTS 列的数据类型为字符串。我想转换为数组类型,以便可以将其拆分为几列以正确分析数据。
我试图通过更改列数据类型来进行更改,但它给了我一个错误,
alter table my_table change points points array<double>;
我尝试使用不同的方法,例如:
alter table my_table change points points array<array<double>>;
我知道上面的命令是错误的,但我还是想试一试,但在这两种情况下我都遇到了这个错误。错误信息:
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. The following columns have types incompatible with the existing columns in their respective positions :
points (state=08S01,code=1)
知道如何解决此问题以及如何使该列像数组一样兼容以拆分为不同的列。任何帮助我都会感激。
解决方案
以下方法可能对您有所帮助,注意:(如果需要,请加倍)
CREATE TABLE my_table2( devtype string COMMENT 'from deserializer',
points array<array<string>> COMMENT 'from deserializer') row format delimited
fields terminated by ':' stored as textfile;
insert into table my_table values( "Array","[['1538006400', '629928.0625'],
['1538611200', '629928.0625'], ['1539216000', '629928.0625']]");
insert into table my_table2 select devtype, array(
array(trim(regexp_replace(split(points,",")[0],"\\[|\\[\\[\\'|\\'|\\]|\\]\\]",'')),
trim(regexp_replace(split(points,",")[1],"\\[|\\[\\[\\'|\\'|\\]|\\]\\]",''))),
array(trim(regexp_replace(split(points,",")[2],"\\[|\\[\\[\\'|\\'|\\]|\\]\\]",'')),
trim(regexp_replace(split(points,",")[3],"\\[|\\[\\[\\'|\\'|\\]|\\]\\]",''))),
array(trim(regexp_replace(split(points,",")[4],"\\[|\\[\\[\\'|\\'|\\]|\\]\\]",'')),
trim(regexp_replace(split(points,",")[5],"\\[|\\[\\[\\'|\\'|\\]|\\]\\]",''))))
from my_table_tmp;
hive> select devtype, points[0][0],points[1][0],points[2][0] from my_table2;
OK
Array 1538006400 1538611200 1539216000
hive> select devtype, points[0][1],points[1][1],points[2][1] from my_table2;
OK
Array 629928.0625 629928.0625 629928.0625
推荐阅读
- authentication - 如何为 pydrive2 创建自定义身份验证流程
- python - 我们如何从所有子目录中读取所有 json 文件?
- flask - Flask - 使用蓝图在子域上提供静态文件
- android - 使用 JavaMail 保存和显示附件
- python - 将 Django 应用程序部署到 Heroku 时收到服务器错误 (500)
- azure-web-app-service - GitHub 操作部署到 Azure Web 应用时出错
- angular - 根据所选元素的长度动态改变角垫选择宽度
- android - Firebase 对象访问显示空指针异常
- azure - Azure 负载均衡器使用区域 1,因为订阅 xxx-xxx-xxx-xxx 不支持位置 westus 中的区域 1
- matlab - 虚部为 0 的 FFT bin