首页 > 解决方案 > 从不同主机运行时,docker 映像不同

问题描述

在 docker 容器中构建 3rd 方库(libtorch如果重要的话)期间,我遇到了缺少包含文件的错误。从 Ubuntu 16.04 主机运行构建过程时,相同的构建过程运行良好,但从 Ubuntu 18.04 主机运行时,文件丢失。

经过一番追溯,我现在只是从 NVidia 运行基本容器,并寻找文件。这是我得到的输出:

Ubuntu 16.04 host

$ uname -a
Linux ub-carmel 4.15.0-123-generic #126~16.04.1-Ubuntu SMP Wed Oct 21 13:48:05 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

$ docker --version
Docker version 19.03.13, build 4484c46d9d

$ docker pull  nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04

11.1-cudnn8-devel-ubuntu18.04: Pulling from nvidia/cuda
Digest: sha256:c5bf5c984998cc18a3f3a741c2bd7187ed860dc6d993b6fb402d0effb9fe6579
Status: Image is up to date for nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
$ docker run -it  nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
root@2ecc17248fab:/# ll /usr/lib/gcc/x86_64-linux-gnu/7/include | grep ia32
-rw-r--r-- 1 root root   7817 Dec  4  2019 ia32intrin.h

Ubuntu 18.04 host

$ uname -a
Linux ub-carmel-18-04 5.4.0-56-generic #62~18.04.1-Ubuntu SMP Tue Nov 24 10:07:50 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

$ docker --version
Docker version 19.03.14, build 5eb3275d40

$ docker pull  nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04

11.1-cudnn8-devel-ubuntu18.04: Pulling from nvidia/cuda
Digest: sha256:c5bf5c984998cc18a3f3a741c2bd7187ed860dc6d993b6fb402d0effb9fe6579
Status: Downloaded newer image for nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
$ docker run -it  nvcr.io/nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
root@89f771e82a51:/# ll /usr/lib/gcc/x86_64-linux-gnu/7/include | grep ia32
root@89f771e82a51:/#

如您所见,图像的 sha256 摘要是相同的(并且与 NVidia 的 NGC 的摘要相匹配

起初我认为可能以某种隐藏的方式包含来自主机,但ia32intrin.h文件存在于两个主机中

什么会导致这样的问题?

编辑

添加了docker --version每个主机的输出。有区别,但我怀疑这会导致此类问题

编辑 2

添加了输出uname -a

编辑 3

输出docker version

Ubuntu 16

$ docker version
Client: Docker Engine - Community
 Version:           19.03.13
 API version:       1.40
 Go version:        go1.13.15
 Git commit:        4484c46d9d
 Built:             Wed Sep 16 17:02:59 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.13
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       4484c46d9d
  Built:            Wed Sep 16 17:01:30 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.3.7
  GitCommit:        8fba4e9a7d01810a393d5d25a3621dc101981175
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Ubuntu 18

$ docker version
Client: Docker Engine - Community
 Version:           19.03.14
 API version:       1.40
 Go version:        go1.13.15
 Git commit:        5eb3275d40
 Built:             Tue Dec  1 19:20:17 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.14
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       5eb3275d40
  Built:            Tue Dec  1 19:18:45 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.3.9
  GitCommit:        ea765aba0d05254012b0b9e595e995c09186427f
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

因此,我在不同的 Ubuntu 机器(EC2 实例)上对其进行了测试,在这种情况下,对于 18.04 和 16.04,该文件都存在。所以看起来这是我机器上的问题。有什么想法会导致这种情况吗?

标签: c++dockergcc

解决方案


最好的猜测是 Ubuntu 18.04 主机上的拉取层以某种方式损坏。清理它的核选项是重置 docker。这将删除所有图像、卷、容器、日志、网络等所有内容,因此请在运行之前备份您想要保留的所有内容:

sudo -s # these commands need root
systemctl stop docker
rm -rf /var/lib/docker
systemctl start docker
exit # exit sudo

推荐阅读