首页 > 解决方案 > 在 Hive 中使用 Regex Serde 创建表返回错误

问题描述

我在 Hive 中使用 Regex Serde 创建了一个表。在 Hue 中,它返回表的创建成功。但是,当我尝试返回表格SELECT * FROM pricefile_edited或以色调查看表格时,它不起作用并且我得到了 Error 。

数据为 130 个字符(每行),没有分隔符。

有谁知道似乎是什么问题,并提供帮助?谢谢

CREATE EXTERNAL TABLE pricefile_edited(
field1 STRING,
field2 STRING,
field3 STRING,
field4 STRING,
field5 STRING,
field6 STRING, 
field7 STRING,
field8 STRING,
field9 STRING,
field10 STRING,
field11 STRING,
field12 STRING,
field13 STRING,
field14 STRING,
field15 STRING,
field16 STRING,
field17 STRING,
field18 STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' 
 WITH SERDEPROPERTIES ("input.regex" = 
"(\\.{12})(\\.{1})(\\.{1})(\\.{24})(\\.{6})(\\.{6})(\\.{13})(\\.{6})(\\.{1})(\\.{4})(\\.{1})(\\.{3})(\\.{17})(\\.{9})(\\.{12})(\\.{1})(\\.{1})(\\.
{12})")
LOCATION '/user/hive/warehouse';

我收到此错误:

请求 TFetchResultsReq(fetchType=0, operationHandle=TOperationHandle(hasResultSet=True, modifiedRowCount=None, operationType=0, operationId=THandleIdentifier(secret='\xc3\xd7\x97\xd3coB\xa1\x90P\x9e\xab\ x82\xa4\xf4A', guid='\x80\xa1\x93\xe2\x10\xefJ\xd9\xa3\xa3\xdb\x1f\x95\x85\x88\xb3')), 方向=4, maxRows= 100): TFetchResultsResp(status=TStatus(errorCode=0, errorMessage='java.io.IOException: java.io.IOException: 不是文件: hdfs://quickstart.cloudera:8020/user/hive/warehouse/categories' , sqlState=None, infoMessages=['*org.apache.hive.service.cli.HiveSQLException:java.io.IOException: java.io.IOException: 不是文件: hdfs://quickstart.cloudera:8020/user/蜂巢/仓库/类别:25:24','org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:463', 'org.apache.hive.service.cli.operation.OperationManager:getOperationNextRowSet:OperationManager.java:294', 'org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java: 769'、'sun.reflect.GeneratedMethodAccessor20:invoke::-1'、'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43'、'java.lang.reflect.Method:invoke:Method.java:498' , 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78', 'org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36', 'org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63', 'java.security.AccessController:doPrivileged:AccessController.java:-2', 'javax.security.auth.Subject:doAs:Subject.java:422', 'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1917', 'org.apache.hive.service.cli.session.HiveSessionProxy:invoke: HiveSessionProxy.java:59'、'com.sun.proxy.$Proxy21:fetchResults::-1'、'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:462'、'org.apache .hive.service.cli.thrift.ThriftCLIService:FetchResults:ThriftCLIService.java:694', 'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1553', 'org .apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1538', 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', 'org.apache.thrift .TBaseProcessor:进程:TBaseProcessor.java:39'、'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56'、'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286'、'java. util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149'、'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624'、'java.lang.Thread:run:Thread.java:748' , '*java.io.IOException:java.io.IOException: 不是文件: hdfs://quickstart.cloudera:8020/user/hive/warehouse/categories:29:4', 'org.apache.hadoop.hive .ql.exec.FetchOperator:getNextRow:FetchOperator.java:508', 'org.apache.hadoop.hive.ql.exec.FetchOperator:pushRow:FetchOperator.java:415', 'org.apache.hadoop.hive.ql .exec.FetchTask:fetch:FetchTask.java:140', 'org.apache.hadoop.hive.ql.Driver:getResults:Driver.java:2069', 'org.apache.hive.service.cli.operation.SQLOperation:getNextRowSet:SQLOperation.java:458', '*java.io.IOException :不是文件:hdfs://quickstart.cloudera:8020/user/hive/warehouse/categories:32:3', 'org.apache.hadoop.mapred.FileInputFormat:getSplits:FileInputFormat.java:322', 'org .apache.hadoop.hive.ql.exec.FetchOperator:getNextSplits:FetchOperator.java:363', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getRecordReader:FetchOperator.java:295', 'org.apache .hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:446'], statusCode=3), results=None, hasMoreRows=None)'*java.io.IOException: 不是文件:hdfs://quickstart.cloudera:8020/user/hive/warehouse/categories:32:3', 'org.apache.hadoop.mapred.FileInputFormat:getSplits:FileInputFormat. java:322', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextSplits:FetchOperator.java:363', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getRecordReader:FetchOperator.java: 295', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:446'], statusCode=3), results=None, hasMoreRows=None)'*java.io.IOException: 不是文件:hdfs://quickstart.cloudera:8020/user/hive/warehouse/categories:32:3', 'org.apache.hadoop.mapred.FileInputFormat:getSplits:FileInputFormat. java:322', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextSplits:FetchOperator.java:363', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getRecordReader:FetchOperator.java: 295', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:446'], statusCode=3), results=None, hasMoreRows=None)hadoop.hive.ql.exec.FetchOperator:getRecordReader:FetchOperator.java:295', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:446'], statusCode=3), 结果=无,有更多行=无)hadoop.hive.ql.exec.FetchOperator:getRecordReader:FetchOperator.java:295', 'org.apache.hadoop.hive.ql.exec.FetchOperator:getNextRow:FetchOperator.java:446'], statusCode=3), 结果=无,有更多行=无)

标签: regexhivehiveqlcreate-tablehive-serde

解决方案


表位置似乎错误:/user/hive/warehouse - 这看起来像默认仓库目录。里面有一些目录。它失败了/user/hive/warehouse/categories,说这不是一个文件。看起来这是类别表目录。

在 /user/hive/warehouse 目录中创建一个文件夹并将文件放入其中。像这样:

/user/hive/warehouse/pricefiles/pricefile_edited.txt

更改 DDL 中的表位置:

LOCATION '/user/hive/warehouse/pricefiles

正则表达式不正确。每列应该在正则表达式中有相应的组(in parenthesis)。例如,第一列的正则表达式表示它是 12 个点.,因为\\.字面意思是点字符。如果你想要任何 12 个字符,它应该是 (.{12}) 没有两个斜杠。还要在组之间添加分隔符(空格或制表符或其他): (.{12})(.{1}) - 这将占用 140219078921B0 (140219078921) 中的 12 个字符并B作为第二列。相应地修复您的正则表达式,并在必要时在组之间添加空格(分隔符)。还要从正则表达式中删除额外的输入,将其写为单行。

您可以使用以下方法以简单的方式测试正则表达式regexp_extract(string, regexp, group_number)

hive> select regexp_extract('140219078921B0 A1DU1M 1223105DDB','(.{12})',1); --extract group number 1 (group 0 is the whole regexp)
OK
140219078921
Time taken: 1.057 seconds, Fetched: 1 row(s)

hive> select regexp_extract('140219078921B0 A1DU1M 1223105DDB','(.{12})(.{1})',2); --extract group number 2
OK
B
Time taken: 0.441 seconds, Fetched: 1 row(s)

等等。添加更多组并仔细测试


推荐阅读