首页 > 解决方案 > 来自 libtorch_cuda_cpp.so(Pytorch)的分段错误

问题描述

我正在尝试运行https://pytorch.org/tutorials/beginner/fgsm_tutorial.html。前 651 张图片一切正常,然后出现分段错误错误。我检查了 GPU(GTX 1050)上的内存使用情况,看起来还不错。我也在我朋友的 GTX 1050ti 上运行了相同的代码,它运行良好。我重新安装了 Ubuntu 并清理了驱动程序和 CUDA 工具的设置,但问题仍然存在。我用 GNU 调试器执行代码,这是我在 651 张图片之后得到的:

Thread 12 "python3" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffedcdb5700 (LWP 8697)]
0x00007fff32aa46ca in std::_Hashtable<at::native::ConvolutionParams, std::pair<at::native::ConvolutionParams const, cudnnConvolutionBwdDataAlgoPerf_t>, std::allocator<std::pair<at::native::ConvolutionParams const, cudnnConvolutionBwdDataAlgoPerf_t> >, std::__detail::_Select1st, at::native::ParamsEqual<at::native::ConvolutionParams>, at::native::ParamsHash<at::native::ConvolutionParams>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_find_before_node(unsigned long, at::native::ConvolutionParams const&, unsigned long) const ()
   from /home/muco/.local/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so
$nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

$nvidia-smi
Thu May 20 16:12:04 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.80       Driver Version: 460.80       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1050    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   57C    P0    N/A /  N/A |   1123MiB /  4040MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       849      G   /usr/lib/xorg/Xorg                204MiB |
|    0   N/A  N/A      1332      G   budgie-wm                          23MiB |
|    0   N/A  N/A      1673      G   ...AAAAAAAAA= --shared-files       58MiB |
|    0   N/A  N/A      6801      G   ...AAAAAAAAA= --shared-files       61MiB |
|    0   N/A  N/A      8679      C   /usr/bin/python3                  769MiB |

CuDNN 版本:8.1.0

Ubuntu 版本:20.04

GCC 版本:9.3.0

Python版本:3.8.5

我想知道这是硬件问题还是某种错误?

标签: pythonc++pytorch

解决方案


推荐阅读