python - 在 RTX 3060 Ti GPU 上训练 Yolov5 我收到错误“RuntimeError: Unable to find a valid cuDNN algorithm to run convolution”
问题描述
使用以下命令在 RTX 3060 Ti GPU 上使用 --img 8088 和批量大小 16 训练 Yolov5
python train.py --img 1088 --batch 16 --epochs 3 --data coco128.yaml --weights yolov5s.pt --device 0 --workers 0
我收到以下异常“RuntimeError:无法找到有效的 cuDNN 算法来运行卷积”并且通过将批量大小减少到 8,我可以训练模型
File "train.py", line 611, in <module>
main(opt)
File "train.py", line 509, in main
train(opt.hyp, opt, device)
File "train.py", line 311, in train
pred = model(imgs) # forward
File "C:\Program Files\Python38\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\hamza.m\workspace\yolov5\models\yolo.py", line 123, in forward
return self.forward_once(x, profile, visualize) # single-scale inference, train
File "C:\Users\hamza.m\workspace\yolov5\models\yolo.py", line 155, in forward_once
x = m(x) # run
File "C:\Program Files\Python38\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\hamza.m\workspace\yolov5\models\common.py", line 137, in forward
return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))
File "C:\Program Files\Python38\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\hamza.m\workspace\yolov5\models\common.py", line 45, in forward
return self.act(self.bn(self.conv(x)))
File "C:\Program Files\Python38\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Program Files\Python38\lib\site-packages\torch\nn\modules\conv.py", line 423, in forward
return self._conv_forward(input, self.weight)
File "C:\Program Files\Python38\lib\site-packages\torch\nn\modules\conv.py", line 419, in _conv_forward
return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution
此外,通过保持图像大小 640 和批量大小 60
python train.py --img 640 --batch 64 --epochs 3 --data coco128.yaml --weights yolov5s.pt --device 0 --workers 0
然后得到一些不同的运行时错误
File "train.py", line 611, in <module>
main(opt)
File "train.py", line 509, in main
train(opt.hyp, opt, device)
File "train.py", line 311, in train
pred = model(imgs) # forward
File "C:\Program Files\Python38\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\hamza.m\workspace\yolov5\models\yolo.py", line 123, in forward
return self.forward_once(x, profile, visualize) # single-scale inference, train
File "C:\Users\hamza.m\workspace\yolov5\models\yolo.py", line 155, in forward_once
x = m(x) # run
File "C:\Program Files\Python38\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\hamza.m\workspace\yolov5\models\common.py", line 137, in forward
return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))
RuntimeError: CUDA out of memory. Tried to allocate 100.00 MiB (GPU 0; 8.00 GiB total capacity; 5.48 GiB already allocated; 81.94 MiB free; 5.61 GiB reserved in total by PyTorch)
PS也有人可以指导我如何评估哪种GPU最适合训练我的模型,请也给我启发
解决方案
答案在错误日志中
运行时错误:CUDA 内存不足。尝试分配 100.00 MiB(GPU 0;8.00 GiB 总容量;5.48 GiB 已分配;81.94 MiB 空闲;PyTorch 总共保留 5.61 GiB)
它试图分配比 GPU 上更多的内存。
推荐阅读
- php - 我需要修复我的 php 代码
- php - Laravel 按请求保存多态模型
- docker - 如何将文件从本地复制到namenode Docker中的目录?
- node.js - 是否可以在我的应用程序中获得 k8s pod 限制?
- java - Spring-Data/Spring-Boot 下单元测试中的 LazyInitializationException
- python - 在python中创建一个带有嵌套循环的表
- react-native - 地图填充并适合屏幕。不工作
- fortran - 有没有办法从导入的模块中找到使用了哪些过程/类型?
- ios - 在 Swift 中将数组转换为字典
- android - 使用 Firebase Auth 进行身份验证时,短信验证码请求失败