首页 > 解决方案 > Spark GPUenabler ClassNotFoundException: CacheGPU

问题描述

我正在使用包 IBMSparkGPU/GPUenabler 包。我使用 sbt 程序集将所有依赖项打包到一个 jar 文件中,并将其提交给 spark 独立集群管理器。但是,会出现以下错误消息:

org.apache.spark.SparkException: Error sending message [message = CacheGPUDS(a99176e95cf37ba4e5e46b9b172369ac_-1728716590,false)]
    at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119)
    at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
    at com.ibm.gpuenabler.GPUMemoryManagerMasterEndPoint.com$ibm$gpuenabler$GPUMemoryManagerMasterEndPoint$$tell(GPUMemoryManager.scala:172)
    at com.ibm.gpuenabler.GPUMemoryManagerMasterEndPoint$$anonfun$registerGPUMemoryManager$2.apply(GPUMemoryManager.scala:64)
    at com.ibm.gpuenabler.GPUMemoryManagerMasterEndPoint$$anonfun$registerGPUMemoryManager$2.apply(GPUMemoryManager.scala:64)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
    at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)
    at com.ibm.gpuenabler.GPUMemoryManagerMasterEndPoint.registerGPUMemoryManager(GPUMemoryManager.scala:64)
    at com.ibm.gpuenabler.GPUMemoryManagerMasterEndPoint$$anonfun$receiveAndReply$1.applyOrElse(GPUMemoryManager.scala:143)
    at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:105)
    at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
    at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
    at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult
    at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
    at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
    at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
    ... 16 more

Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: com.ibm.gpuenabler.CacheGPUDS
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1866)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1749)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2040)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:108)
    at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1$$anonfun$apply$1.apply(NettyRpcEnv.scala:259)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
    at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:308)
    at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1.apply(NettyRpcEnv.scala:258)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
    at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:257)
    at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:577)
    at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:562)
    at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:159)
    at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:107)
    at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:119)
    at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
    at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
    ... more

如果我以 master 身份提交到 local[*],则不会出现此错误消息,并且程序可以运行。我还可以在禁用 spark.gpuenabler.autocache 时消除错误。但是,还有其他方法可以正确解决此问题吗?

我正在使用 Ubuntu 17.04、JRE 1.8.0、Scala 2.11 和 Spark 2.1.0。

标签: scalaapache-sparkgpu

解决方案


事实证明,添加选项 --conf "spark.executor.extraClassPath=file://path/to/jar"将解决问题。另一件事是我需要将 jar 文件粘贴到具有相同路径的所有机器上。否则,工人将无法获得 jar 文件。


推荐阅读