首页 > 解决方案 > 在 macOS BigSur 上的 docker build 上导入 pytorch-encoding 失败

问题描述

我有一个基于图像的 dockerpytorch/pytorch:1.6.0-cuda10.1-cudnn7-devel图像。它在 macOS Catalina 上成功构建,但在 macOS Big Sur 上构建失败。在 python 脚本中运行时失败import encoding(参见下面的堆栈跟踪)。似乎这个问题与包装 c++ 实现的 python 深度学习库有关。

我不想手动修复 c++ 实现,而是更喜欢更通用的解决方案。

我尝试在没有 docker 的情况下运行该项目(在 Big Sur 机器上)并且它成功了。我尝试在另一台装有 macOS Catalina 的机器上构建它,它仍然可以成功构建。

我真的需要在 Big Sur 的 docker 中构建它。有人可以帮忙吗?

细节:

Dockerfile:

FROM pytorch/pytorch:1.6.0-cuda10.1-cudnn7-devel

RUN apt-get -y update && apt-get install -y --no-install-recommends \
         curl \
         ca-certificates \
    && rm -rf /var/lib/apt/lists/* && apt-get clean

ENV PATH="/miniconda/bin:$PATH"
RUN curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh >> miniconda.sh \
    && bash ./miniconda.sh -b -p /miniconda; rm ./miniconda.sh

WORKDIR /opt/app

COPY conda.yaml ./

RUN apt-get -y update && apt-get install -y build-essential cmake \
    && conda env update --prefix /miniconda --file conda.yaml \
    && conda clean -tipsy \
    && rm -rf /var/lib/apt/lists/* && apt-get clean \
    && rm -rf ~/.cache/pip

RUN apt-get -y update && apt-get install -y --no-install-recommends \
         libgl1 libglib2.0-0 \
    && rm -rf /var/lib/apt/lists/* && apt-get clean

COPY app/preload_model.py ./

RUN python preload_model.py

preload_model.py

import encoding #fails here

encoding.models.get_model('DeepLab_ResNeSt200_ADE', pretrained=True)

conda.yaml

channels:
  - defaults
  - pytorch
dependencies:
  - python=3.8
  - pytorch=1.6.0
  - cudatoolkit=10.1
  - scipy=1.5.2
  - Flask=1.1.2
  - gunicorn=20.0.4
  - torchvision=0.7.0
  - Pillow=7.2.0
  - requests=2.24.0
  - numpy=1.19.1
  - ca-certificates
  - certifi
  - pip=20.2.2
  - pip:
      - brotlipy==0.7.0
      - chardet==3.0.4
      - click==7.1.2
      - future==0.18.2
      - itsdangerous==1.1.0
      - Jinja2==2.11.2
      - nose==1.3.7
      - opencv-python==4.4.0.44
      - portalocker==2.0.0
      - six==1.15.0
      - torch-encoding==1.2.1
      - tqdm==4.50.0
      - Werkzeug==1.0.1
      - imagehash==4.2.0
      - Flask-Caching==1.9.0

错误:

> [10/17] RUN python preload_model.py:
#14 1.680 No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
#14 69.42 Traceback (most recent call last):
#14 69.42   File "/miniconda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1509, in _run_ninja_build
#14 69.42     subprocess.run(
#14 69.42   File "/miniconda/lib/python3.8/subprocess.py", line 516, in run
#14 69.42     raise CalledProcessError(retcode, process.args,
#14 69.42 subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
#14 69.42 
#14 69.42 During handling of the above exception, another exception occurred:
#14 69.42 
#14 69.42 Traceback (most recent call last):
#14 69.42   File "preload_model.py", line 1, in <module>
#14 69.42     import encoding
#14 69.42   File "/miniconda/lib/python3.8/site-packages/encoding/__init__.py", line 13, in <module>
#14 69.42     from . import nn, functions, parallel, utils, models, datasets, transforms
#14 69.42   File "/miniconda/lib/python3.8/site-packages/encoding/nn/__init__.py", line 12, in <module>
#14 69.42     from .encoding import *
#14 69.42   File "/miniconda/lib/python3.8/site-packages/encoding/nn/encoding.py", line 18, in <module>
#14 69.42     from ..functions import scaled_l2, aggregate, pairwise_cosine
#14 69.42   File "/miniconda/lib/python3.8/site-packages/encoding/functions/__init__.py", line 2, in <module>
#14 69.42     from .encoding import *
#14 69.42   File "/miniconda/lib/python3.8/site-packages/encoding/functions/encoding.py", line 14, in <module>
#14 69.42     from .. import lib
#14 69.42   File "/miniconda/lib/python3.8/site-packages/encoding/lib/__init__.py", line 9, in <module>
#14 69.42     cpu = load('enclib_cpu', [
#14 69.42   File "/miniconda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 963, in load
#14 69.42     return _jit_compile(
#14 69.42   File "/miniconda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1170, in _jit_compile
#14 69.42     _write_ninja_file_and_build_library(
#14 69.42   File "/miniconda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1276, in _write_ninja_file_and_build_library
#14 69.42     _run_ninja_build(
#14 69.42   File "/miniconda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1529, in _run_ninja_build
#14 69.42     raise RuntimeError(message)
#14 69.42 RuntimeError: Error building extension 'enclib_cpu': [1/7] c++ -MMD -MF encoding_cpu.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /miniconda/lib/python3.8/site-packages/torch/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/TH -isystem /miniconda/lib/python3.8/site-packages/torch/include/THC -isystem /miniconda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /miniconda/lib/python3.8/site-packages/encoding/lib/cpu/encoding_cpu.cpp -o encoding_cpu.o 
#14 69.42 [2/7] c++ -MMD -MF rectify_cpu.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /miniconda/lib/python3.8/site-packages/torch/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/TH -isystem /miniconda/lib/python3.8/site-packages/torch/include/THC -isystem /miniconda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /miniconda/lib/python3.8/site-packages/encoding/lib/cpu/rectify_cpu.cpp -o rectify_cpu.o 
#14 69.42 FAILED: rectify_cpu.o 
#14 69.42 c++ -MMD -MF rectify_cpu.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /miniconda/lib/python3.8/site-packages/torch/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/TH -isystem /miniconda/lib/python3.8/site-packages/torch/include/THC -isystem /miniconda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /miniconda/lib/python3.8/site-packages/encoding/lib/cpu/rectify_cpu.cpp -o rectify_cpu.o 
#14 69.42 c++: internal compiler error: Killed (program cc1plus)
#14 69.42 Please submit a full bug report,
#14 69.42 with preprocessed source if appropriate.
#14 69.42 See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
#14 69.42 [3/7] c++ -MMD -MF nms_cpu.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /miniconda/lib/python3.8/site-packages/torch/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/TH -isystem /miniconda/lib/python3.8/site-packages/torch/include/THC -isystem /miniconda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /miniconda/lib/python3.8/site-packages/encoding/lib/cpu/nms_cpu.cpp -o nms_cpu.o 
#14 69.42 FAILED: nms_cpu.o 
#14 69.42 c++ -MMD -MF nms_cpu.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /miniconda/lib/python3.8/site-packages/torch/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/TH -isystem /miniconda/lib/python3.8/site-packages/torch/include/THC -isystem /miniconda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /miniconda/lib/python3.8/site-packages/encoding/lib/cpu/nms_cpu.cpp -o nms_cpu.o 
#14 69.42 c++: internal compiler error: Killed (program cc1plus)
#14 69.42 Please submit a full bug report,
#14 69.42 with preprocessed source if appropriate.
#14 69.42 See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
#14 69.42 [4/7] c++ -MMD -MF syncbn_cpu.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /miniconda/lib/python3.8/site-packages/torch/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/TH -isystem /miniconda/lib/python3.8/site-packages/torch/include/THC -isystem /miniconda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /miniconda/lib/python3.8/site-packages/encoding/lib/cpu/syncbn_cpu.cpp -o syncbn_cpu.o 
#14 69.42 FAILED: syncbn_cpu.o 
#14 69.42 c++ -MMD -MF syncbn_cpu.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /miniconda/lib/python3.8/site-packages/torch/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/TH -isystem /miniconda/lib/python3.8/site-packages/torch/include/THC -isystem /miniconda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /miniconda/lib/python3.8/site-packages/encoding/lib/cpu/syncbn_cpu.cpp -o syncbn_cpu.o 
#14 69.42 c++: internal compiler error: Killed (program cc1plus)
#14 69.42 Please submit a full bug report,
#14 69.42 with preprocessed source if appropriate.
#14 69.42 See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
#14 69.42 [5/7] c++ -MMD -MF roi_align_cpu.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /miniconda/lib/python3.8/site-packages/torch/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/TH -isystem /miniconda/lib/python3.8/site-packages/torch/include/THC -isystem /miniconda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /miniconda/lib/python3.8/site-packages/encoding/lib/cpu/roi_align_cpu.cpp -o roi_align_cpu.o 
#14 69.42 In file included from /miniconda/lib/python3.8/site-packages/torch/include/ATen/ATen.h:9:0,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/encoding/lib/cpu/

.....

roi_align_cpu.cpp:1:
#14 69.42 /miniconda/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:354:7: note: declared here
#14 69.42    T * data() const {
#14 69.42        ^~~~
#14 69.42 In file included from /miniconda/lib/python3.8/site-packages/torch/include/ATen/ATen.h:9:0,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/encoding/lib/cpu/roi_align_cpu.cpp:1:
#14 69.42 /miniconda/lib/python3.8/site-packages/encoding/lib/cpu/roi_align_cpu.cpp:472:34: warning: ‘T* at::Tensor::data() const [with T = float]’ is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
#14 69.42        bottom_rois.data<scalar_t>(),
#14 69.42                                   ^
#14 69.42 In file included from /miniconda/lib/python3.8/site-packages/torch/include/ATen/Tensor.h:3:0,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/torch/include/ATen/Context.h:4,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/torch/include/ATen/ATen.h:5,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
#14 69.42                  from /miniconda/lib/python3.8/site-packages/encoding/lib/cpu/roi_align_cpu.cpp:1:
#14 69.42 /miniconda/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:354:7: note: declared here
#14 69.42    T * data() const {
#14 69.42        ^~~~
#14 69.42 [6/7] c++ -MMD -MF operator.o.d -DTORCH_EXTENSION_NAME=enclib_cpu -DTORCH_API_INCLUDE_EXTENSION_H -isystem /miniconda/lib/python3.8/site-packages/torch/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /miniconda/lib/python3.8/site-packages/torch/include/TH -isystem /miniconda/lib/python3.8/site-packages/torch/include/THC -isystem /miniconda/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /miniconda/lib/python3.8/site-packages/encoding/lib/cpu/operator.cpp -o operator.o 
#14 69.42 ninja: build stopped: subcommand failed.
#14 69.42 
------
executor failed running [/bin/sh -c python preload_model.py]: exit code: 1

标签: dockerpytorchmacos-big-sur

解决方案


推荐阅读