apache-spark - 在提交通过 pyspark 代码连接 hbase 的 spark 作业期间获取“IOException:Broken pipe”
问题描述
我通过 pyspark newAPIHadoopRDD 提交了一个 spark 作业来做一些简单的事情,它将在作业运行期间连接 hbase。我们的 CHD 启用了 kerberos,但我想我已经通过了身份验证。
我将展示我的代码、shell、异常和一些 CM 配置。
>
"19/01/16 10:55:42 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x36850456cea05e5
19/01/16 10:55:42 INFO zookeeper.ZooKeeper: Session: 0x36850456cea05e5 closed
Traceback (most recent call last):
File "/home/xxx/xxx/xxx_easy_hbase.py", line 36, in <module>
conf=hbaseconf)
File "/opt/cloudera/parcels/CDH-5.13.3-1.cdh5.13.3.p0.2/lib/spark/python/lib/pyspark.zip/pyspark/context.py", line 644, in newAPIHadoopRDD
File "/opt/cloudera/parcels/CDH-5.13.3-1.cdh5.13.3.p0.2/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
File "/opt/cloudera/parcels/CDH-5.13.3-1.cdh5.13.3.p0.2/lib/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError19/01/16 10:55:42 INFO zookeeper.ClientCnxn: EventThread shut down
: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=32, exceptions:
Wed Jan 16 10:55:42 CST 2019, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68449: row 'event_opinion_type,,00000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=xxx.com,60020,1547543835462, seqNum=0
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:320)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:247)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:62)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:210)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:327)
at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:302)
at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:167)
at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:162)
...
Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=68449: row 'event_opinion_type,,00000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=xxx.com,60020,1547543835462, seqNum=0
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:169)
at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
Caused by: java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
... 4 more"
解决方案
推荐阅读
- php - 如何在 php 和 mysql 中进行乘法运算
- typescript - TypeScript 中枚举类型的可分配性规则是什么?
- spring-boot - 从 dockerized jar 发送 post-request 到本地 python-service
- android-recyclerview - AndroidX recyclerview 不显示空白预览
- c# - 用户在c#中输入日期时间
- timber - 使用 {{post.attribute}} 语法在产品循环中打印产品属性
- python-3.x - 在我的程序中发现时间和内存效率低下
- jquery - 返回 Bootstrap Datepicker 正在使用的选项
- opencv - openCV图像拼接广角160度
- android - Kotlin 协程和 Android - 获取 Firestore 查询任务结果的结果