首页 > 解决方案 > 服务模型时出现 Amazon Sagemaker ModelError

问题描述

我在 S3 存储桶中上传了一个变压器 roberta 模型。我现在尝试使用带有 SageMaker Python SDK 的 Pytorch 对模型进行推理。我指定了模型目录s3://snet101/sent.tar.gz,它是模型(pytorch_model.bin)及其所有依赖项的压缩文件。这是代码

model = PyTorchModel(model_data=model_artifact,
                   name=name_from_base('roberta-model'),
                   role=role, 
                   entry_point='torchserve-predictor2.py',
                   source_dir='source_dir',
                   framework_version='1.4.0',
                   py_version = 'py3',
                   predictor_cls=SentimentAnalysis)
predictor = model.deploy(initial_instance_count = 1, instance_type = 'ml.m4.xlarge')
test_data = {"text": "How many cows are in the farm ?"}
prediction = predictor.predict(test_data)

我在预测器对象的预测方法上收到以下错误:

ModelError                                Traceback (most recent call last)
<ipython-input-6-bc621eb2e056> in <module>
----> 1 prediction = predictor.predict(test_data)

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/predictor.py in predict(self, data, initial_args, target_model, target_variant)
    123 
    124         request_args = self._create_request_args(data, initial_args, target_model, target_variant)
--> 125         response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
    126         return self._handle_response(response)
    127 

~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
    355                     "%s() only accepts keyword arguments." % py_operation_name)
    356             # The "self" in this scope is referring to the BaseClient.
--> 357             return self._make_api_call(operation_name, kwargs)
    358 
    359         _api_call.__name__ = str(py_operation_name)

~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
    674             error_code = parsed_response.get("Error", {}).get("Code")
    675             error_class = self.exceptions.from_code(error_code)
--> 676             raise error_class(parsed_response, operation_name)
    677         else:
    678             return parsed_response

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from model with message "Your invocation timed out while waiting for a response from container model. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again.". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/roberta-model-2020-12-16-09-42-37-479 in account 165258297056 for more information.

我检查了服务器日志错误

java.lang.IllegalArgumentException: reasonPhrase contains one of the following prohibited characters: \r\n: Can't load config for '/.sagemaker/mms/models/model'. Make sure that:
'/.sagemaker/mms/models/model' is a correct model identifier listed on 'https://huggingface.co/models'
or '/.sagemaker/mms/models/model' is the correct path to a directory containing a config.json file

我怎样才能解决这个问题?

标签: amazon-s3pytorchamazon-sagemakerhuggingface-transformers

解决方案


我有同样的问题,似乎端点正在尝试使用路径“/.sagemaker/mms/models/model”加载预训练模型并失败。

也许这条路径不正确,或者可能无法访问 S3 存储桶,因此它无法将模型存储到给定路径中。


推荐阅读