python-3.x - what causes an unpickling stack underflow when trying to serialize a succesfully generated SageMaker model
问题描述
I am currently working on setting up a pipeline in Amazon Sagemaker. For that I set up an xgboost-estimator and trained it on my dataset. The training job runs as expected and the freshly trained model is saved to the specified output bucket. Later I want to reimport the model, which is done by getting the mode.tar.gz from the output bucket, extracting the model and serializing the binary via pickle.
# download the model artifact from AWS S3
!aws s3 cp s3://my-bucket/output/sagemaker-xgboost-2021-09-06-12-19-41-306/output/model.tar.gz .
# opens the downloaded model artifcat and loads it as 'model' variable
model_path = "model.tar.gz"
with tarfile.open(model_path) as tar:
tar.extractall(path=".")
model = pkl.load(open("xgboost-model", "rb"))
Whenever I try to tun this I receive an unpickling stack underflow:
---------------------------------------------------------------------------
UnpicklingError Traceback (most recent call last)
<ipython-input-9-b88a7424f790> in <module>
10 tar.extractall(path=".")
11
---> 12 model = pkl.load(open("xgboost-model", "rb"))
13
UnpicklingError: unpickling stack underflow
So far I retrained the model to see, if the error occurs with a different model file and it does. I also downloaded the model.tar.gz and validated it via gunzip. When extracting the binary file xgboost-model is extracted correctly, I just can't pickle it. Every occurence of the error I found on stackoverflow points at a damaged file, but this one is generated directly by SageMaker and I do note perform any transformation on it, but extracting it from the model.tar.gz. Reloading a model like this seems to be quite a common use case, referring to the documentation and different tutorials. Locally I receive the same error with the downloaded file. I tried to step directly into pickle for debugging it but couldn't make much sense of it. The complete error stack looks like this:
Exception has occurred: UnpicklingError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
unpickling stack underflow
File "/sagemaker_model.py", line 10, in <module>
model = pkl.load(open('xgboost-model', 'rb'))
File "/usr/local/Cellar/python@3.9/3.9.1_5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/local/Cellar/python@3.9/3.9.1_5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/usr/local/Cellar/python@3.9/3.9.1_5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 268, in run_path
return _run_module_code(code, init_globals, run_name,
File "/usr/local/Cellar/python@3.9/3.9.1_5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/local/Cellar/python@3.9/3.9.1_5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main (Current frame)
return _run_code(code, main_globals, None,
What could cause this issue and at which step during the process could I apply changes to fix or workaround the problem.
解决方案
The issue rooted in the model version used for the xgboost framework. from 1.3.0 on the default output changed from pickle to json and the sagemaker documentation does not seem to have been updated accordingly. So if you want to read the model via
tar.extractall(path=".")
model = pkl.load(open("xgboost-model", "rb"))
as described in the sagemaker docs, you need to import the XGBOOST framework with with a former version, e.g. 1.2.1.
推荐阅读
- r - 在 ggplot2 中删除类别并添加数据点
- java - Java 构造函数问题,输出打印两次
- reactjs - 如何在 SPA React 应用程序上正确配置 Apache,该应用程序位于使用从 React Router 构建的嵌套 URL 的子目录上?
- firebase - Flutter:何时激活 Firestore 文档/查询的侦听器
- kotlin - 为什么应该内联具有具体类型参数的函数?
- python - mysql.connector.errors.DatabaseError: 2005 (HY000): Unknown MySQL server host 'db' (2)
- arrays - 数组元素未打印
- javascript - 未找到模块:React 中的“D:\Learning Programming\Javascript....”错误无法解析“persons”
- javascript - 如果加载时间很长,如何中止 XMLHttpRequest?
- rust - Rust 借用检查器阻止我从迭代器返回引用