hdfs - 如何以嵌入式模式从 Drill 访问 HDFS?
问题描述
尝试在单个节点上运行 apache Drill ,遵循从嵌入式钻头访问 HDFS 的文章,但出现错误
➜ Apps /home/hph_etl/Apps/apache-drill-1.16.0/bin/sqlline -u "jdbc:drill:zk=local;schema=dfs"
...
apache drill (dfs)> select * from dfs.`tmp/`;
Error: RESOURCE ERROR: Failed to load schema for "dfs"!
java.net.ConnectException: Call From HW04.ucera.local/172.18.4.49 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
[Error Id: 2fd541ee-2290-4cf8-979b-aca3c77859e2 ] (state=,code=0)
apache drill (dfs)> !q
Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
dfs 存储插件文件的样子...
{
"type": "file",
"connection": "hdfs://localhost:8020/",
"config": null,
"workspaces": {
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
},
"root": {
"location": "/",
"writable": false,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
}
},
"formats": {
"psv": {
"type": "text",
"extensions": [
"tbl"
....
}
(请注意,我真的不知道如何确定 hdfs 连接应该是哪个端口)并且错误消息的链接(http://wiki.apache.org/hadoop/ConnectionRefused)无处可去。从另一个 SO帖子尝试替代解决方案会引发错误:
➜ Apps /home/hph_etl/Apps/apache-drill-1.16.0/bin/sqlline -u "jdbc:drill:drillbit=localhost:31010;schema=dfs"
Error: Failure in connecting to Drill: org.apache.drill.exec.rpc.RpcException: CONNECTION : io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:31010 (state=,code=0)
java.sql.SQLNonTransientConnectionException: Failure in connecting to Drill: org.apache.drill.exec.rpc.RpcException: CONNECTION : io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:31010
at org.apache.drill.jdbc.impl.DrillConnectionImpl.<init>(DrillConnectionImpl.java:178)
at org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(DrillJdbc41Factory.java:67)
不确定此时要检查什么;任何调试建议或修复?
解决方案
最终,起作用的是将连接 hdfs IP 设置为 hadoop 集群的 namenode 的 IP(来自另一篇关于连接到 HDFS 的 SO帖子),因此 Drill dfs 存储插件配置如下所示:
{
"type": "file",
"connection": "hdfs://localhost:8020/",
"config": null,
"workspaces": {
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
},
....
}
我们可以做
➜ bin /home/hph_etl/Apps/apache-drill-1.16.0/bin/sqlline -u "jdbc:drill:zk=local;schema=dfs"
Apache Drill 1.16.0
"Got Drill?"
apache drill (dfs)> select * from dfs.`tmp/`;
Error: PERMISSION ERROR: Not authorized to read table [tmp/] in schema [dfs.default]
[Error Id: 2e248da5-ba30-43f7-a983-1784d77cf81b ] (state=,code=0)
apache drill (dfs)>
(请注意,现在我需要修复一个权限错误,但至少现在可以尝试查询该位置)。
推荐阅读
- liquid - FetchXML 查询在流动模板中不起作用
- javascript - HTML表单输入类型文件用javascript预选值
- caching - Apache2.4 +AEM+HEAD
- sql - 使用 SQL Server 存储过程时出现 VBScript 错误“需要对象”
- r - 为什么 R.Version() 返回“语言”字符串?
- security - 在 Bug Sur (11.0.1) 上提示蓝牙权限
- powershell - 在 VSCode 中,settings.json,错误:“草案 2019-09 模式尚未完全支持。(768)”
- vb.net - 整数排序问题
- python - Pgadmin 未加载
- python - 使用列表中的值替换单独变量中的子字符串