hadoop - 无法从 Hive 外部表上的 Druid 数据源查询数据
问题描述
Druid 集群和 Hive/Hadoop 集群单独运行正常。我们正在 Hive 中创建一个从 Druid(用于 ETL)目的读取数据的表,但是,在初始测试中,我们发现我们无法从中执行简单操作SELECT *
,然后出现以下错误:
hive> select * from druid_hive_table;
OK
druid_hive_table.__time druid_hive_table.op_ts druid_hive_table.op_type druid_hive_table.pos druid_hive_table.table
Failed with exception java.io.IOException:org.apache.hive.druid.com.fasterxml.jackson.databind.JsonMappingException: Can not deserialize instance of java.util.ArrayList out of START_OBJECT token
at [Source: org.apache.hive.druid.com.metamx.http.client.io.AppendableByteArrayInputStream@656c5818; line: -1, column: 4]
Time taken: 0.449 seconds
但是,一个SELECT COUNT(*)
工作正常!
hive> select count(*) from druid_hive_table;
OK
$f0
21409
Time taken: 0.199 seconds, Fetched: 1 row(s)
规格:
德鲁伊外部表
SET hive.druid.broker.address.default=<host>:8082;
CREATE EXTERNAL TABLE druid_hive_table
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.datasource" = "druid_datasource_name");
hive> DESCRIBE FORMATTED druid_hive_table;
OK
col_name data_type comment
# col_name data_type comment
__time timestamp from deserializer
op_ts string from deserializer
op_type string from deserializer
pos string from deserializer
table string from deserializer
# Detailed Table Information
Database: tests
Owner: OWNER
CreateTime: Mon Feb 10 13:52:13 UTC 2020
LastAccessTime: UNKNOWN
Retention: 0
Location: <LOCATION>
Table Type: EXTERNAL_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"}
EXTERNAL TRUE
druid.datasource druid_datasource_name
numFiles 0
numRows 0
rawDataSize 0
storage_handler org.apache.hadoop.hive.druid.DruidStorageHandler
totalSize 0
transient_lastDdlTime 1581342733
# Storage Information
SerDe Library: org.apache.hadoop.hive.druid.serde.DruidSerDe
InputFormat: null
OutputFormat: null
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.144 seconds, Fetched: 37 row(s)
供参考 - Druid Supervisor Spec:
{
"dataSchema": {
"dataSource": "druid_datasource_name",
"timestampSpec": {
"column": "current_ts",
"format": "iso",
"missingValue": null
},
"dimensionsSpec": {
"dimensions": [],
"dimensionExclusions": [
"current_ts"
]
},
"metricsSpec": [],
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "HOUR",
"queryGranularity": {
"type": "none"
},
"rollup": false,
"intervals": null
},
"transformSpec": {
"filter": null,
"transforms": []
}
},
"ioConfig": {
"topic": "<kafka_topic>",
"inputFormat": {
"type": "json",
"flattenSpec": {
"useFieldDiscovery": true,
"fields": []
},
"featureSpec": {}
},
"replicas": 1,
"taskCount": 1,
"taskDuration": "PT3600S",
"consumerProperties": {
"bootstrap.servers": "<bootstrap_servers>",
"group.id": "<group_name>",
"security.protocol": "SASL_SSL",
"ssl.truststore.location": "<location>",
"ssl.truststore.password": "<pass>",
"sasl.jaas.config": "<config>",
"sasl.mechanism": "SCRAM-SHA-512"
},
"pollTimeout": 100,
"startDelay": "PT5S",
"period": "PT30S",
"useEarliestOffset": true,
"completionTimeout": "PT1800S",
"lateMessageRejectionPeriod": null,
"earlyMessageRejectionPeriod": null,
"lateMessageRejectionStartDateTime": null,
"stream": "<kafka_topic>",
"useEarliestSequenceNumber": true,
"type": "kafka"
},
"tuningConfig": {
"type": "kafka",
"maxRowsInMemory": 1000000,
"maxBytesInMemory": 0,
"maxRowsPerSegment": 5000000,
"maxTotalRows": null,
"intermediatePersistPeriod": "PT10M",
"basePersistDirectory": "/opt/apache-druid-0.17.0/var/tmp/druid-realtime-persist7801461398656096281",
"maxPendingPersists": 0,
"indexSpec": {
"bitmap": {
"type": "concise"
},
"dimensionCompression": "lz4",
"metricCompression": "lz4",
"longEncoding": "longs"
},
"indexSpecForIntermediatePersists": {
"bitmap": {
"type": "concise"
},
"dimensionCompression": "lz4",
"metricCompression": "lz4",
"longEncoding": "longs"
},
"buildV9Directly": true,
"reportParseExceptions": false,
"handoffConditionTimeout": 0,
"resetOffsetAutomatically": false,
"segmentWriteOutMediumFactory": null,
"workerThreads": null,
"chatThreads": null,
"chatRetries": 8,
"httpTimeout": "PT10S",
"shutdownTimeout": "PT80S",
"offsetFetchPeriod": "PT30S",
"intermediateHandoffPeriod": "P2147483647D",
"logParseExceptions": false,
"maxParseExceptions": 2147483647,
"maxSavedParseExceptions": 0,
"skipSequenceNumberAvailabilityCheck": false,
"repartitionTransitionDuration": "PT120S"
},
"type": "kafka"
}
感谢您帮助解决此问题。
解决方案
我已经设法解决了这个问题。将 Hive 和 Hadoop 更新到版本 3+ 解决了这个问题。
使用以下代码就像在一块面包上涂黄油一样简单:
SET hive.druid.broker.address.default=<host>:8082;
CREATE EXTERNAL TABLE druid_hive_table
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.datasource" = "druid_datasource_name");
推荐阅读
- go - 将方法转换为 func var
- mysql - 如何正确编写此 SQL 语句
- php - 将 signal r 客户端连接到 PHP 服务器以进行实时通信
- reactjs - 每次重访屏幕时都显示插页式广告?
- python - 在 django 模型中实现这些关系的正确做法
- r - 在 mutate 中使用“first”
- java - 如何动态计算表中的数据
- angular - 用户使用角度编辑表单时如何显示价值
- java - 在多个 EditText 上添加值并在每个 textchange 上获取总计
- excel - 我的代码在某些计算机上不起作用:Microsoft Forms Error then runtime error 424