scala - Spark Hbase 连接器不能在并行模式下工作?
问题描述
我正在尝试使用用于 spark 2.0 的 Hortonworks hbase 连接器与 hbase 一起使用(https://github.com/hortonworks-spark/shc/tree/v1.1.0-2.0)
使用上面链接中提供的示例,
val spark = SparkSession
.builder()
.appName(getClass.toString)
.getOrCreate()
def withCatalog(cat: String, spark: SparkSession): DataFrame = {
spark
.read
.options(Map(HBaseTableCatalog.tableCatalog->cat))
.format("org.apache.spark.sql.execution.datasources.hbase")
.load()
}
val df = withCatalog(cat, spark)
df.printSchema()
df.show(20, false)
架构:
val cat =
s"""{
|"table":{"namespace":"test", "name":"test_src_data", "tableCoder":"PrimitiveType"},
|"rowkey":"tfkod_description",
|"columns":{
|"col0":{"cf":"rowkey", "col":"tfkod_description", "type":"string"},
|"src_stream_desc":{"cf":"src_data", "col":"src_desc", "type":"string"}
|}
|}""".stripMargin
在我执行 spark2-submit 之后,作业运行并仅打印模式。后来所有的执行者都存在并永远卡住了。
日志中的最后一条消息:
现有 executor 41 已被移除(新总数为 1)
但是我可以以顺序方式成功使用 Hbase,即 put 或 BulkPut 但不是 RDD 或 DF(使用任何 hbase 连接器)方式在 spark 中工作。
由于哪个 spark 执行器无法并行工作,hbase/spark 配置有什么问题吗?或者工作节点中缺少什么?
来自 Worker 的错误消息:
19/05/13 11:36:44 ERROR ipc.AbstractRpcClient: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'.
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:642)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:166)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:769)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:766)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:766)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:920)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:889)
at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1222)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32651)
at org.apache.hadoop.hbase.client.ClientSmallScanner$SmallScannerCallable.call(ClientSmallScanner.java:201)
at org.apache.hadoop.hbase.client.ClientSmallScanner$SmallScannerCallable.call(ClientSmallScanner.java:180)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:346)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:320)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:64)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
... 25 more
解决方案
推荐阅读
- html - 滚动在移动设备上不起作用 - 它会点击列表中的按钮
- java - Java Spring:注册流程
- struct - 如何将 Julia 结构完全解压缩到局部变量中?
- docker - nginx图像未加载
- python - 给定一个月和一年的时间,我如何退回最畅销的商品?
- python - 使用装饰器模拟包装函数中的导入
- python - logit 和 sklearn 管道的一种热编码
- python - Python:输出最频繁的数字及其数量
- python - 如果我有使用 python 的 facebook 用户名,我如何获得“facebook id”?我需要这个过程是自动的
- github - 在 codecov 上添加 provate repo 时出错`需要激活帐户。要查看此页面,您的帐户必须被激活。`