首页 > 解决方案 > 有没有关于这个问题的 cuda 支持的 dl4j 解决方案?

问题描述

我正在尝试执行 MultiGpuLenetMnistExample.java

我收到以下错误

“……

12:41:24.129 [main] INFO Test - Load data....
12:41:24.716 [main] INFO Test - Build model....
12:41:25.500 [main] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [JCublasBackend] backend
ND4J CUDA build version: 10.1.243
CUDA device 0: [Quadro K4000]; cc: [3.0]; Total memory: [3221225472];
12:41:26.692 [main] INFO org.nd4j.nativeblas.NativeOpsHolder - Number of threads used for OpenMP: 32
12:41:26.746 [main] INFO org.nd4j.nativeblas.Nd4jBlas - Number of threads used for OpenMP BLAS: 0
12:41:26.755 [main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Backend used: [CUDA]; OS: [Windows 8.1]
12:41:26.755 [main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Cores: [24]; Memory: [3,5GB];
12:41:26.755 [main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Blas vendor: [CUBLAS]
12:41:26.755 [main] INFO org.nd4j.linalg.jcublas.ops.executioner.CudaExecutioner - Device Name: [Quadro K4000]; CC: [3.0]; Total/free memory: [3221225472]
12:41:26.844 [main] INFO org.deeplearning4j.nn.multilayer.MultiLayerNetwork - Starting MultiLayerNetwork with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [NONE]
12:41:27.957 [main] DEBUG org.nd4j.jita.allocator.impl.MemoryTracker - Free memory on device_0: 2709856256
Exception in thread "main" java.lang.RuntimeException: cudaGetSymbolAddress(...) failed; Error code: [13]
    at org.nd4j.linalg.jcublas.ops.executioner.CudaExecutioner.createShapeInfo(CudaExecutioner.java:2557)
    at org.nd4j.linalg.api.shape.Shape.createShapeInformation(Shape.java:3282)
    at org.nd4j.linalg.api.ndarray.BaseShapeInfoProvider.createShapeInformation(BaseShapeInfoProvider.java:76)
    at org.nd4j.jita.constant.ProtectedCudaShapeInfoProvider.createShapeInformation(ProtectedCudaShapeInfoProvider.java:96)
    at org.nd4j.jita.constant.ProtectedCudaShapeInfoProvider.createShapeInformation(ProtectedCudaShapeInfoProvider.java:77)
    at org.nd4j.linalg.jcublas.CachedShapeInfoProvider.createShapeInformation(CachedShapeInfoProvider.java:44)
    at org.nd4j.linalg.api.ndarray.BaseNDArray.<init>(BaseNDArray.java:211)
    at org.nd4j.linalg.jcublas.JCublasNDArray.<init>(JCublasNDArray.java:383)
    at org.nd4j.linalg.jcublas.JCublasNDArrayFactory.create(JCublasNDArrayFactory.java:1543)
    at org.nd4j.linalg.jcublas.JCublasNDArrayFactory.create(JCublasNDArrayFactory.java:1538)
    at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4298)
    at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:3986)
    at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.init(MultiLayerNetwork.java:688)
    at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.init(MultiLayerNetwork.java:604)
    at Test.main(Test.java:80)

Process finished with exit code 1 "

有没有解决这个问题的方法?

标签: javacudnndeeplearning4jdl4j

解决方案


这里有 2 个选项:要么从源代码构建 dl4j 以获得目标计算能力 (3.0),要么等待下一个版本,因为我们将把它带回来 1 个额外的版本。

此时 cc 3.0 只是被大多数框架认为已弃用 afaik


推荐阅读