首页 > 解决方案 > 在 docker 容器中安装 cuda 时出现异常错误

问题描述

我在 Ubuntu 18.04 操作系统上运行 Docker 18 CE。我的 docker 的基础镜像也是 Ubuntu 18.04。我正在尝试制作一个自定义 docker 映像,我还可以在其中运行和使用 nvidia 和 cuda。仅在安装 cuda 时出现异常问题。

这是获取可执行文件的代码。

RUN wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run RUN chmod +x cuda_9.0.176_384.81_linux-run 
RUN ./cuda_9.0.176_384.81_linux-run

当 docker 开始安装时,具体到这一点

Step 16/33 : RUN ./cuda_9.0.176_384.81_linux-run
 ---> Running in dcca0c9973cc
The command line turns completely blank. I dont see an error message, or logs for the process as you usually do. It's just pitch black. I've let it run for hours and there is no response.

我的 Dockerfile 如下:

FROM ubuntu:18.04

RUN apt-get update -y && apt-get install -y \
 build-essential \
 curl \
 apt-utils \
 python \
 python-dev \
 python-pip \
 python3 \
 python3-dev \
 python3-pip \
 swig \
 unzip \
 sox \
 libsox-dev \
 python-pyaudio \
 git \
 wget \
 silversearcher-ag \
 ranger \
 ffmpeg \
 python3-levenshtein \
 python-numpy \
 libcurl3-dev  \
 ca-certificates \
 gcc-6 \
 g++-6 \
 libsox-fmt-mp3 \
 htop \
 nano \
 cmake \
 zlib1g-dev \
 libbz2-dev \
 liblzma-dev \
 locales \
 pkg-config \
 libsox-dev \
 freeglut3-dev \
 libx11-dev \
 libxmu-dev \
 libxi-dev \
 libglu1-mesa \
 libglu1-mesa-dev \
 dpkg

RUN DEBIAN_FRONTEND=noninteractive apt-get install keyboard-configuration

WORKDIR /home/setup/

RUN wget https://github.com/bazelbuild/bazel/releases/download/0.15.0/bazel-0.15.0-installer-linux-x86_64.sh
RUN chmod +x bazel-0.15.0-installer-linux-x86_64.sh
RUN ./bazel-0.15.0-installer-linux-x86_64.sh
RUN rm bazel-0.15.0-installer-linux-x86_64.sh


# Install NVIDIA

#RUN sudo echo "blacklist nouveau" >> /etc/modprobe.d/blacklist-nouveau.conf
#RUN sudo echo "options nouveau modeset=0" >> /etc/modprobe.d/blacklist-nouveau.conf

#RUN add-apt-repository ppa:graphics-drivers
RUN apt-get update
RUN apt install nvidia-driver-390 -y

# Install Python specific packages

RUN pip3 install --upgrade setuptools pip wheel

RUN pip3 install absl-py==0.9.0 \
astor==0.8.1 \
attrdict==2.0.1 \
audioread==2.1.8 \
cffi==1.13.2 \
cycler==0.10.0 \
Cython==0.29.14 \
decorator==4.4.1 \
deepspeech==0.4.1 \
gast==0.3.3 \
grpcio==1.26.0 \
h5py==2.10.0 \
joblib==0.14.1 \
Keras-Applications==1.0.8 \
Keras-Preprocessing==1.1.0 \
kiwisolver==1.1.0 \
librosa==0.7.2 \
llvmlite==0.31.0 \
Markdown==3.1.1 \
matplotlib==3.1.2 \
numba==0.47.0 \
numexpr==2.7.1 \
numpy==1.18.1 \
pandas==0.25.3 \
progressbar==2.5 \
protobuf==3.11.2 \
pycparser==2.19 \
pydub==0.23.1 \
pyparsing==2.4.6 \
python-dateutil==2.8.1 \
python-Levenshtein==0.12.0 \
python-speech-features==0.6 \
pytz==2019.3 \
resampy==0.2.2 \
scikit-learn==0.22.1 \
scipy==1.4.1 \
six==1.14.0 \
SoundFile==0.10.3.post1 \
tables==3.6.1 \
tensorboard==1.12 \
tensorflow-gpu==1.12.0 \
Werkzeug==0.16.0

## Install Cuda

RUN wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run RUN chmod +x cuda_9.0.176_384.81_linux.run 
RUN ./cuda_9.0.176_384.81_linux.run
#RUN rm cuda_9.0.176_384.81_linux.run

## Install cudnn

RUN wget https://www.dropbox.com/s/o0ffjf1j0bftrq9/cudnn-9.2-linux-x64-v7.2.1.38.tgz?dl=1 -O cudnn-9.2-linux-x64-v7.2.1.38.tgz
RUN tar -xzvf cudnn-9.2-linux-x64-v7.2.1.38.tgz
RUN cp -P cuda/include/cudnn.h /usr/local/cuda-9.0/include
RUN cp -P cuda/lib64/libcudnn* /usr/local/cuda-9.0/lib64/
RUN chmod a+r /usr/local/cuda-9.0/lib64/libcudnn*

RUN export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-9.0/lib64/
RUN export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64/


ENTRYPOINT /bin/bash

标签: dockerdocker-composedockerfile

解决方案


由于-silent您的Dockerfile. 这导致 cuda 安装程序等待 EULA 接受。看来您丢失了 endline 并且文件名不匹配

RUN wget -q https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run -O cuda_9.0.176_384.81_linux.run
RUN chmod +x cuda_9.0.176_384.81_linux.run
RUN ./cuda_9.0.176_384.81_linux.run --silent

推荐阅读