首页 > 解决方案 > 在 AWS EC2 上加载 torch.hub.load('pytorch/fairseq', 'roberta.large.mnli') 时出错

问题描述

我正在尝试在 AWS 上的 EC2 实例上使用 Torch(和 Roberta 语言模型)运行一些代码。编译似乎失败了,有人有修复的指针吗?

确认 Torch 已正确安装

import torch
a = torch.rand(5,3)
print (a)

返回:张量([[0.7494, 0.5213, 0.8622],...

尝试加载罗伯塔

roberta = torch.hub.load('pytorch/fairseq', 'roberta.large.mnli')
Using cache found in /home/ubuntu/.cache/torch/hub/pytorch_fairseq_master
/home/ubuntu/.local/lib/python3.8/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0
fatal: not a git repository (or any of the parent directories): .git
running build_ext
/home/ubuntu/.local/lib/python3.8/site-packages/torch/utils/cpp_extension.py:352: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
  warnings.warn(msg.format('we could not find ninja.'))
skipping 'fairseq/data/data_utils_fast.cpp' Cython extension (up-to-date)
skipping 'fairseq/data/token_block_utils_fast.cpp' Cython extension (up-to-date)
building 'fairseq.libnat' extension
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/ubuntu/.local/lib/python3.8/site-packages/torch/include -I/home/ubuntu/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/ubuntu/.local/lib/python3.8/site-packages/torch/include/TH -I/home/ubuntu/.local/lib/python3.8/site-packages/torch/include/THC -I/usr/include/python3.8 -c fairseq/clib/libnat/edit_dist.cpp -o build/temp.linux-x86_64-3.8/fairseq/clib/libnat/edit_dist.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=libnat -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
In file included from /home/ubuntu/.local/lib/python3.8/site-packages/torch/include/ATen/Parallel.h:149,
                 from /home/ubuntu/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/utils.h:3,
                 from /home/ubuntu/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:5,
                 from /home/ubuntu/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/nn.h:3,
                 from /home/ubuntu/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:12,
                 from /home/ubuntu/.local/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/torch.h:3,
                 from fairseq/clib/libnat/edit_dist.cpp:9:
/home/ubuntu/.local/lib/python3.8/site-packages/torch/include/ATen/ParallelOpenMP.h:84: warning: ignoring #pragma omp parallel [-Wunknown-pragmas]
   84 | #pragma omp parallel for if ((end - begin) >= grain_size)

然后结束,过了很久。

x86_64-linux-gnu-gcc: fatal error: Killed signal terminated program cc1plus compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

标签: amazon-ec2pytorchtorchroberta-language-model

解决方案


通过在本地而不是从集线器加载预训练模型来使其工作。

from fairseq.models.roberta import RobertaModel
roberta = RobertaModel.from_pretrained('roberta.large.mnli', 'model.pt', '/home/ubuntu/deployedapp/roberta.large')
roberta.eval()

请注意,我必须使用 XLarge EC2 实例来运行它,否则进程会因内存不足而被终止。


推荐阅读