python - RuntimeError:cuda 运行时错误(48):没有内核映像可用于在 mmdet/ops/roi_a lign/src/roi_align_kernel.cu:139 的设备上执行
问题描述
我在谷歌计算引擎虚拟机上使用我的代码时遇到了一点麻烦。
我正在尝试运行一个小烧瓶 API 来检测图像中的表格。初始化检测器模型有效,但是当我尝试检测表时会发生此错误:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/local/lib/python3.5/dist-packages/flask/_compat.py", line 39, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "ElvyCascadeTabNetAPI.py", line 36, in detect_tables
result = inference_detector(model, "temp.jpg")
File "/SingleModelTest/src/mmdet/mmdet/apis/inference.py", line 86, in inference_detector
result = model(return_loss=False, rescale=True, **data)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/SingleModelTest/src/mmdet/mmdet/core/fp16/decorators.py", line 49, in new_func
return old_func(*args, **kwargs)
File "/SingleModelTest/src/mmdet/mmdet/models/detectors/base.py", line 149, in forward
return self.forward_test(img, img_metas, **kwargs)
File "/SingleModelTest/src/mmdet/mmdet/models/detectors/base.py", line 130, in forward_test
return self.simple_test(imgs[0], img_metas[0], **kwargs)
File "/SingleModelTest/src/mmdet/mmdet/models/detectors/cascade_rcnn.py", line 342, in simple_test
x[:len(bbox_roi_extractor.featmap_strides)], rois)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/SingleModelTest/src/mmdet/mmdet/core/fp16/decorators.py", line 127, in new_func
return old_func(*args, **kwargs)
File "/SingleModelTest/src/mmdet/mmdet/models/roi_extractors/single_level.py", line 105, in forward
roi_feats_t = self.roi_layers[i](feats[i], rois_)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/SingleModelTest/src/mmdet/mmdet/ops/roi_align/roi_align.py", line 144, in forward
self.sample_num, self.aligned)
File "/SingleModelTest/src/mmdet/mmdet/ops/roi_align/roi_align.py", line 36, in forward
spatial_scale, sample_num, output)
RuntimeError: cuda runtime error (48) : no kernel image is available for execution on the device at mmdet/ops/roi_a
lign/src/roi_align_kernel.cu:139
当我搜索可能的解决方案时,我遇到了几个 stackoverflow 问题,其中的问题是旧 GPU 不受支持,因此我将我的谷歌计算引擎 VM 上的 GPU 更改为从 Nvidia Tesla K80 到 Nvidia Tesla T4 的较新的 GPU。K80 的 cuda 计算能力为 3.7,而新 T4 的计算能力为 7.5,所以我认为这可以解决问题,但事实并非如此。
输出nvidia-smi
:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 72C P8 12W / 70W | 106MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 918 G /usr/lib/xorg/Xorg 95MiB |
| 0 N/A N/A 974 G /usr/bin/gnome-shell 9MiB |
+-----------------------------------------------------------------------------+
nvcc --version
:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
火炬版本:火炬1.4.0+cu100
视觉版本0.5.0+cu100
我在 docker 容器中运行 API,我的 Dockerfile:
# Dockerfile
FROM nvidia/cuda:10.0-devel
RUN nvidia-smi
RUN set -xe \
&& apt-get update \
&& apt-get install python3-pip -y \
&& apt-get install git -y \
&& apt-get install libgl1-mesa-glx -y
RUN pip3 install --upgrade pip
WORKDIR /SingleModelTest
COPY requirements /SingleModelTest/requirements
RUN export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64
RUN pip3 install -r requirements/requirements1.txt
RUN pip3 install -r requirements/requirements2.txt
COPY . /SingleModelTest
ENTRYPOINT ["python3"]
CMD ["TabNetAPI.py"]
编辑:由于 cuda 版本比我安装的更高,我对输出感到困惑nvidia-smi
,但事实证明这是正常的:https ://medium.com/@brianhourigan/if-different-cuda-versions-are-显示-nvcc-和-nvidia-smi-its-necessarily-not-a-problem-and-311eda26856c
如果有人有解决方案,我将不胜感激。如果我需要提供更多信息,我很乐意提供。
先感谢您。
解决方案
推荐阅读
- python - 如何使用 askfilename() 或类似方法通过 Python 获取 .url 文件的名称
- python - Spyder IDE 中 unicode 注释的一致值错误
- computer-vision - 如何在 pytorch 上保存训练有素的模型的权重?
- go - 我如何用接口重构这段代码,如果是这样,我应该吗?
- c - 当工作区位于 ~/src 而不是 /tmp 时,VS Code 在“stdio.h”下显示红色曲线
- coded-ui-tests - 数据:“无法反序列化 params.x - 绑定:第 47 位缺少必填字段”
- java - 使用 LinkedLists 在 Java 中添加多项式
- python - 获取数组中单选按钮的内容
- javascript - 如何使用 getUserMedia() 选择要激活的相机以进行条码扫描功能?
- python - 关于 Freecodecamp 练习的 Pandas 数据规范化解释?