首页 > 解决方案 > 如何将 hive 的数据类型从字符串更改为多维数组以正确拆分列

问题描述

我有一个蜂巢表,如:

+----------+----------------------------------------------------------------------------------------------+
| DEVTYPE  |                POINTS                                                                        |
+----------+----------------------------------------------------------------------------------------------+
| Array    | [['1538006400', '629928.0625'], ['1538611200', '629928.0625'], ['1539216000', '629928.0625']]|
| Array    | [['1541030400', '629928.0625'], ['1541635200', '629928.0625'], ['1542240000', '629928.0625']]|       
| Array    | [['1544054400', '629928.0625'], ['1544659200', '629928.0625'], ['1545264000', '629928.125']] |
| Array    | [['1547078400', '629928.0625'], ['1547683200', '629928.0625'], ['1548288000', '629928.0625']]|
| Array    | [['1550102400', '629928.0625'], ['1550707200', '629928.125'], ['1551312000', '629928.0625']] |
+----------+----------------------------------------------------------------------------------------------+

但是当我描述该表时,我发现 POINTS 列的数据类型为字符串。我想转换为数组类型,以便可以将其拆分为几列以正确分析数据。

我试图通过更改列数据类型来进行更改,但它给了我一个错误,

alter table my_table change points points array<double>;

我尝试使用不同的方法,例如:

alter table my_table change points points array<array<double>>; 

我知道上面的命令是错误的,但我还是想试一试,但在这两种情况下我都遇到了这个错误。错误信息:

Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. The following columns have types incompatible with the existing columns in their respective positions :
points (state=08S01,code=1)

知道如何解决此问题以及如何使该列像数组一样兼容以拆分为不同的列。任何帮助我都会感激。

标签: hive

解决方案


以下方法可能对您有所帮助,注意:(如果需要,请加倍)

CREATE TABLE my_table2( devtype string COMMENT 'from deserializer', 
points array<array<string>> COMMENT 'from deserializer') row format delimited 
fields terminated by ':' stored as textfile;

insert into table my_table values( "Array","[['1538006400', '629928.0625'],
 ['1538611200', '629928.0625'], ['1539216000', '629928.0625']]");

insert into table my_table2 select devtype, array(
array(trim(regexp_replace(split(points,",")[0],"\\[|\\[\\[\\'|\\'|\\]|\\]\\]",'')),
trim(regexp_replace(split(points,",")[1],"\\[|\\[\\[\\'|\\'|\\]|\\]\\]",''))), 
array(trim(regexp_replace(split(points,",")[2],"\\[|\\[\\[\\'|\\'|\\]|\\]\\]",'')),
trim(regexp_replace(split(points,",")[3],"\\[|\\[\\[\\'|\\'|\\]|\\]\\]",''))), 
array(trim(regexp_replace(split(points,",")[4],"\\[|\\[\\[\\'|\\'|\\]|\\]\\]",'')),
trim(regexp_replace(split(points,",")[5],"\\[|\\[\\[\\'|\\'|\\]|\\]\\]",'')))) 
from my_table_tmp;


hive> select devtype, points[0][0],points[1][0],points[2][0] from my_table2;
OK
Array   1538006400      1538611200      1539216000

hive> select devtype, points[0][1],points[1][1],points[2][1] from my_table2;
OK
Array   629928.0625     629928.0625     629928.0625


推荐阅读