scikit-learn - Sagemaker 批量转换“ValueError:无法将字符串转换为浮点数”
问题描述
我正在使用 sagemaker 和批量转换运行本地转换器。但是,似乎转换没有调用我的自定义代码。
以下是SKlearn初始化
from sagemaker.sklearn.estimator import SKLearn
source_dir = 'train'
script_path = 'train.py'
sklearn = SKLearn(
entry_point=script_path,
train_instance_type="local_gpu",
source_dir=source_dir,
role=role,
sagemaker_session=sagemaker_session)
sklearn.fit({'train': "file://test.csv"})
train.py 是一个 python 脚本,用于加载训练数据,并将模型保存到 S3
批量转换是:
transformer = sklearn.transformer(instance_count=1,
entry_point=source_dir+"/"+script_path,
instance_type='local_gpu',
strategy='MultiRecord',
assemble_with='Line'
)
transformer.transform("file://test_messages", content_type='text/csv', split_type='Line')
print('Waiting for transform job: ' + transformer.latest_transform_job.job_name)
transformer.wait()
file://test_messages
包含一个 csv,它是一个字符串列表
完整的错误是
algo-1-6c5rl_1 | 172.18.0.1 - - [30/Jan/2020:14:14:30 +0000] "GET /ping HTTP/1.1" 200 0 "-" "-"
algo-1-6c5rl_1 | 172.18.0.1 - - [30/Jan/2020:14:14:30 +0000] "GET /execution-parameters HTTP/1.1" 404 232 "-" "-"
algo-1-6c5rl_1 | 2020-01-30 14:14:30,846 ERROR - train - Exception on /invocations [POST]
algo-1-6c5rl_1 | Traceback (most recent call last):
algo-1-6c5rl_1 | File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_functions.py", line 93, in wrapper
algo-1-6c5rl_1 | return fn(*args, **kwargs)
algo-1-6c5rl_1 | File "/miniconda3/lib/python3.7/site-packages/sagemaker_sklearn_container/serving.py", line 56, in default_input_fn
algo-1-6c5rl_1 | return np_array.astype(np.float32) if content_type in content_types.UTF8_TYPES else np_array
algo-1-6c5rl_1 | ValueError: could not convert string to float: 'IMPORTANT - You could be entitled up to �3,160 in compensation from mis-sold PPI on a credit card or loan. Please reply PPI for info or STOP to opt out.'
algo-1-6c5rl_1 |
algo-1-6c5rl_1 | During handling of the above exception, another exception occurred:
algo-1-6c5rl_1 |
algo-1-6c5rl_1 | Traceback (most recent call last):
algo-1-6c5rl_1 | File "/miniconda3/lib/python3.7/site-packages/flask/app.py", line 2446, in wsgi_app
algo-1-6c5rl_1 | response = self.full_dispatch_request()
algo-1-6c5rl_1 | File "/miniconda3/lib/python3.7/site-packages/flask/app.py", line 1951, in full_dispatch_request
algo-1-6c5rl_1 | rv = self.handle_user_exception(e)
algo-1-6c5rl_1 | File "/miniconda3/lib/python3.7/site-packages/flask/app.py", line 1820, in handle_user_exception
algo-1-6c5rl_1 | reraise(exc_type, exc_value, tb)
algo-1-6c5rl_1 | File "/miniconda3/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
algo-1-6c5rl_1 | raise value
algo-1-6c5rl_1 | File "/miniconda3/lib/python3.7/site-packages/flask/app.py", line 1949, in full_dispatch_request
algo-1-6c5rl_1 | rv = self.dispatch_request()
algo-1-6c5rl_1 | File "/miniconda3/lib/python3.7/site-packages/flask/app.py", line 1935, in dispatch_request
algo-1-6c5rl_1 | return self.view_functions[rule.endpoint](**req.view_args)
algo-1-6c5rl_1 | File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_transformer.py", line 200, in transform
algo-1-6c5rl_1 | self._model, request.content, request.content_type, request.accept
algo-1-6c5rl_1 | File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_transformer.py", line 227, in _default_transform_fn
algo-1-6c5rl_1 | data = self._input_fn(content, content_type)
algo-1-6c5rl_1 | File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_functions.py", line 95, in wrapper
algo-1-6c5rl_1 | six.reraise(error_class, error_class(e), sys.exc_info()[2])
algo-1-6c5rl_1 | File "/miniconda3/lib/python3.7/site-packages/six.py", line 692, in reraise
algo-1-6c5rl_1 | raise value.with_traceback(tb)
algo-1-6c5rl_1 | File "/miniconda3/lib/python3.7/site-packages/sagemaker_containers/_functions.py", line 93, in wrapper
algo-1-6c5rl_1 | return fn(*args, **kwargs)
algo-1-6c5rl_1 | File "/miniconda3/lib/python3.7/site-packages/sagemaker_sklearn_container/serving.py", line 56, in default_input_fn
algo-1-6c5rl_1 | return np_array.astype(np.float32) if content_type in content_types.UTF8_TYPES else np_array
algo-1-6c5rl_1 | sagemaker_containers._errors.ClientError: could not convert string to float: 'IMPORTANT - You could be entitled up to �3,160 in compensation from mis-sold PPI on a credit card or loan. Please reply PPI for info or STOP to opt out.'
algo-1-6c5rl_1 | 172.18.0.1 - - [30/Jan/2020:14:14:30 +0000] "POST /invocations HTTP/1.1" 500 290 "-" "-"
.Waiting for transform job: sagemaker-scikit-learn-2020-01-30-14-14-30-490
似乎它无法处理我的字符串。我在 train.py 中确实有代码可以使用 TfidfVectorizer 转换字符串,但是没有调用该代码
解决方案
我是 AWS SageMaker 的工程师。感谢您提供 Estimator/Transformer 设置的详细信息以及完整的错误日志。
查看具体错误,看起来 Scikit-learn 容器在default_input_fn
. 值得庆幸的是,SageMaker Scikit-learn 是开源的,因此我们可以直接访问源代码sagemaker_sklearn_container/serving.py#L56以帮助了解它的工作原理。
容器选择在发送到模型之前执行“默认”输入函数来处理输入。显然,默认实现不适用于您想要的输入格式。
与训练类似,您需要提供自定义 Python 代码来控制 SageMaker Scikit-learn 如何在服务/推理模式下处理您的模型。如果您想覆盖默认值,则需要input_fn
在您的自定义 Python 代码中实现。您可以选择将其添加到您的train.py
脚本中,或者在 Transformer 中传递不同的 Python 文件。
该文档应该有助于编写input_fn
:https ://sagemaker.readthedocs.io/en/stable/using_sklearn.html#process-input
如果您仍然有问题,您可以分享自定义代码中的示例。
推荐阅读
- git - 忽略对 git 中添加的文件的更新
- python - 在 Tensorflow 中,我如何 (1) 计算梯度和 (2) 更新 *separate* @tf.function 方法中的变量?
- flutter - 在下一个屏幕的顶部放置一个小部件?
- android - 指定的孩子已经有父母
- java - 将 Java 应用程序移动到 Spring Boot 会在读取资源时产生 java.io.FileNotFoundException
- java - 如何在 Stream API 中将 Enum 转换为 String?
- discord.js - Discord.js 斜线命令说缺少访问权限,即使我有“使用斜线命令”范围
- reactjs - 试图查看用户是否通过 useContext 进行了身份验证
- python - 断断续续的 Linux 网络摄像头视频
- python - 搞砸了 python 2.7.16 的 pip 并将其升级到 pip v 21.1.2