首页 > 解决方案 > 使用 cuda 在 pytorch 1.0 上训练 frcnn:分段错误(核心转储)

问题描述

我尝试使用这个repo在我的自定义 VOC 数据集上训练 FRCNN:

 ~/miniconda3/bin/python3 trainval_net.py --dataset pascal_voc --net res101 --cag --bs 1 --nw 1  --lr 1e-3 --lr_decay_step 5

Called with args:
Namespace(batch_size=1, checkepoch=1, checkpoint=0, checkpoint_interval=10000, checksession=1, class_agnostic=True, cuda=False, dataset='pascal_voc', disp_interval=100, large_scale=False, lr=0.001, lr_decay_gamma=0.1, lr_decay_step=5, mGPUs=False, max_epochs=20, net='res101', num_workers=1, optimizer='sgd', resume=False, save_dir='models', session=1, start_epoch=1, use_tfboard=False)
/home/stiv/faster-rcnn.pytorch/lib/model/utils/config.py:374: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  yaml_cfg = edict(yaml.load(f))
Using config:{ ... some long stuff here ...},
 'USE_GPU_NMS': True}
WARNING: You have a CUDA device, so you should probably run with --cuda
Loaded dataset `voc_2007_trainval` for training
Set proposal method: gt
Appending horizontally-flipped training examples...
voc_2007_trainval gt roidb loaded from /home/stiv/faster-rcnn.pytorch/data/cache/voc_2007_trainval_gt_roidb.pkl
done
Preparing training data...
done
before filtering, there are 5170 images...
after filtering, there are 5170 images...
5170 roidb entries
Loading pretrained weights from data/pretrained_model/resnet101_caffe.pth
Segmentation fault (core dumped)

我已经安装了所需的依赖项和最新的 g++。而且我还有 cuda 工作(我能够运行 mmdetection 包)。如何诊断和修复此类崩溃?

标签: pythonpytorchfaster-rcnn

解决方案


推荐阅读