首页 > 解决方案 > 在提交通过 pyspark 代码连接 hbase 的 spark 作业期间获取“IOException:Broken pipe”

问题描述

我通过 pyspark newAPIHadoopRDD 提交了一个 spark 作业来做一些简单的事情,它将在作业运行期间连接 hbase。我们的 CHD 启用了 kerberos,但我想我已经通过了身份验证。

我将展示我的代码、shell、异常和一些 CM 配置。

CM配置

pyspark 代码

提交外壳

> 

"19/01/16 10:55:42 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x36850456cea05e5
19/01/16 10:55:42 INFO zookeeper.ZooKeeper: Session: 0x36850456cea05e5 closed
Traceback (most recent call last):
  File "/home/xxx/xxx/xxx_easy_hbase.py", line 36, in <module>
    conf=hbaseconf)
  File "/opt/cloudera/parcels/CDH-5.13.3-1.cdh5.13.3.p0.2/lib/spark/python/lib/pyspark.zip/pyspark/context.py", line 644, in newAPIHadoopRDD
  File "/opt/cloudera/parcels/CDH-5.13.3-1.cdh5.13.3.p0.2/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
  File "/opt/cloudera/parcels/CDH-5.13.3-1.cdh5.13.3.p0.2/lib/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError19/01/16 10:55:42 INFO zookeeper.ClientCnxn: EventThread shut down
: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=32, exceptions:
Wed Jan 16 10:55:42 CST 2019, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68449: row 'event_opinion_type,,00000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=xxx.com,60020,1547543835462, seqNum=0
        at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:320)
        at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:247)
        at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:62)
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:210)
        at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:327)
        at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:302)
        at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:167)
        at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:162)
        ...
Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=68449: row 'event_opinion_type,,00000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=xxx.com,60020,1547543835462, seqNum=0
        at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:169)
        at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        ... 1 more
Caused by: java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
        at sun.nio.ch.IOUtil.write(IOUtil.java:65)
        ... 4 more"

标签: apache-sparkpysparkhbasecloudera-cdh

解决方案


推荐阅读