python - 使用 CUDA 训练期间的运行时错误:图上的边缘条件卷积
问题描述
我对 Python 比较陌生,目前正在尝试在特定的神经网络中使用 CUDA:Edge-Conditioned Convolution on Graphs,代码可以在这里找到https://github.com/mys007/ecc
我知道有几个像我这样的问题,但我无法解决我的问题。
我想用 CUDA 训练一个数据集,但是在训练(随机)纪元期间该过程停止,并出现以下错误:
File "./main.py", line 317, in <module>
main()
File "./main.py", line 219, in main
acc_train, loss, t_loader, t_trainer = train(epoch)
File "./main.py", line 150, in train
outputs = model(inputs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/workspace/ECC_Test/models.py", line 105, in forward
input = module(input)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/workspace/ECC_Test/ecc/GraphConvModule.py", line 173, in forward
return GraphConvFunction(self._in_channels, self._out_channels, idxn, idxe, degs, degs_gpu, self._edge_mem_limit)(input, weights)
File "/workspace/ECC_Test/ecc/GraphConvModule.py", line 69, in forward
cuda_kernels.conv_aggregate_fw(output.narrow(0,startd,numd), products.view(-1,self._out_channels), self._degs_gpu.narrow(0,startd,numd))
File "/workspace/ECC_Test/ecc/cuda_kernels.py", line 122, in conv_aggregate_fw
block=(CUDA_NUM_THREADS,1,1), grid=(GET_BLOCKS(w),n//blockDimY+1,1), stream=stream)
File "cupy/cuda/function.pyx", line 148, in cupy.cuda.function.Function.__call__
File "cupy/cuda/function.pyx", line 130, in cupy.cuda.function._launch
File "cupy/cuda/driver.pyx", line 228, in cupy.cuda.driver.launchKernel
File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
File "cupy/cuda/driver.pyx", line 192, in cupy.cuda.driver.moduleUnload
File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
File "cupy/cuda/driver.pyx", line 192, in cupy.cuda.driver.moduleUnload
File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
File "cupy/cuda/driver.pyx", line 192, in cupy.cuda.driver.moduleUnload
File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
File "cupy/cuda/driver.pyx", line 192, in cupy.cuda.driver.moduleUnload
File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
File "cupy/cuda/driver.pyx", line 192, in cupy.cuda.driver.moduleUnload
File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
File "cupy/cuda/driver.pyx", line 192, in cupy.cuda.driver.moduleUnload
File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
File "cupy/cuda/driver.pyx", line 192, in cupy.cuda.driver.moduleUnload
File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
File "cupy/cuda/driver.pyx", line 192, in cupy.cuda.driver.moduleUnload
File "cupy/cuda/driver.pyx", line 81, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback 是“CUDA_LAUNCH_BLOCKING=1”
切换到 CPU 并停用 CUDA 工作正常。
我正在使用 SSH 访问具有 4 Nvidia Tesla V100 32GB 驱动程序版本 410.104 的服务器。安装了 CUDA 10.1 和 Python 3.6.8。
目前 Pytorch 是 1.1。较高的 PyTorch 版本是否会导致与 CUDA 10.1 结合使用的问题?还是我在 GPU 上的内存不足?
解决方案
推荐阅读
- php - $row 将在此返回什么
- regex - 我如何将变量多行 perl 正则表达式与不同的规则匹配
- c# - 如何有条件地从 ASP.NET Core 注册中删除控制器并添加到 ServiceCollection
- python - 熊猫:按组插入一个空白行和一个带索引的行?
- azure - Azure Powershell - 无法将子网设置添加到数据库
- c - 当某个条件为假时如何打破while循环
- html - Bootstrap 4 - 将 div 垂直对齐到列的底部
- java - 如果互联网速度慢并且我不知道要下载的文件大小,如何设置 connectTimeout
- ajax - 在laravel中获取php标签作为前缀的Ajax响应
- javascript - 如何使用 HTML 文件制作 iframe