首页 > 解决方案 > MATLAB:调用 cuDNN 时出现意外错误:CUDNN_STATUS_EXECUTION_FAILED。而迁移学习

问题描述

我收到了这个错误,即使训练开始了,我也没有找到任何与 EXECUTION_FAILED 相关的答案,但是建议使用 GPU 进行训练过程非常慢。如果有帮助,请详细说明。

我正在使用的规格:

CPU = Core-i7 9th Gen Hexacore
RAM = 16GB
GPU = Nvidia GTX 1660Ti 6-GB
MATLAB = R2018b Version

代码:

options = trainingOptions('sgdm', ...
    'MiniBatchSize',32, ...
    'MaxEpochs',10, ...
    'InitialLearnRate',1e-4, ...
    'Shuffle','every-epoch', ...
    'ValidationData',augimdsValidation, ...
    'ValidationFrequency',3, ...
    'Verbose',false, ...
    'Plots','training-progress');
try
    net.internal.cnngpu.reluForward(1);
catch ME
end

netTransfer = trainNetwork(augimdsTrain,layers,options);

错误详情:

Warning: The CUDA driver must recompile the GPU libraries because your device is more recent than the
libraries. Recompiling can take several minutes. Learn more. 
> In parallel.internal.gpu.selectDevice
  In parallel.gpu.GPUDevice.current (line 44)
  In gpuDevice (line 23)
  In nnet.internal.cnn.util.isGPUCompatible (line 10)
  In nnet.internal.cnn.util.GPUShouldBeUsed (line 17)
  In nnet.internal.cnn.assembler.setupExecutionEnvironment (line 24)
  In trainNetwork>doTrainNetwork (line 171)
  In trainNetwork (line 148)
  In viperMat (line 45) 
Error using trainNetwork (line 150)
Unexpected error calling cuDNN: CUDNN_STATUS_EXECUTION_FAILED.

标签: matlabtransfer-learning

解决方案


因此,性能缓慢是由于通过了更大的批量大小,而训练减少批量大小使其更快(但它仍然无法与 python 库进行比较)。关于错误,您可以重新执行几次代码以消除错误,或者您可以简单地在下面编写代码以在启动时抑制它。

warning off parallel:gpu:device:DeviceLibsNeedsRecompiling

希望对有类似问题的人有所帮助。


推荐阅读