首页 > 解决方案 > what causes an unpickling stack underflow when trying to serialize a succesfully generated SageMaker model

问题描述

I am currently working on setting up a pipeline in Amazon Sagemaker. For that I set up an xgboost-estimator and trained it on my dataset. The training job runs as expected and the freshly trained model is saved to the specified output bucket. Later I want to reimport the model, which is done by getting the mode.tar.gz from the output bucket, extracting the model and serializing the binary via pickle.

# download the model artifact from AWS S3
!aws s3 cp s3://my-bucket/output/sagemaker-xgboost-2021-09-06-12-19-41-306/output/model.tar.gz .

# opens the downloaded model artifcat and loads it as 'model' variable
model_path = "model.tar.gz"
with tarfile.open(model_path) as tar:
    tar.extractall(path=".")

model = pkl.load(open("xgboost-model", "rb"))

Whenever I try to tun this I receive an unpickling stack underflow:

---------------------------------------------------------------------------
UnpicklingError                           Traceback (most recent call last)
<ipython-input-9-b88a7424f790> in <module>
     10     tar.extractall(path=".")
     11 
---> 12 model = pkl.load(open("xgboost-model", "rb"))
     13 

UnpicklingError: unpickling stack underflow

So far I retrained the model to see, if the error occurs with a different model file and it does. I also downloaded the model.tar.gz and validated it via gunzip. When extracting the binary file xgboost-model is extracted correctly, I just can't pickle it. Every occurence of the error I found on stackoverflow points at a damaged file, but this one is generated directly by SageMaker and I do note perform any transformation on it, but extracting it from the model.tar.gz. Reloading a model like this seems to be quite a common use case, referring to the documentation and different tutorials. Locally I receive the same error with the downloaded file. I tried to step directly into pickle for debugging it but couldn't make much sense of it. The complete error stack looks like this:

Exception has occurred: UnpicklingError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
unpickling stack underflow
  File "/sagemaker_model.py", line 10, in <module>
    model = pkl.load(open('xgboost-model', 'rb'))
  File "/usr/local/Cellar/python@3.9/3.9.1_5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/local/Cellar/python@3.9/3.9.1_5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/usr/local/Cellar/python@3.9/3.9.1_5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 268, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/usr/local/Cellar/python@3.9/3.9.1_5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/local/Cellar/python@3.9/3.9.1_5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,

What could cause this issue and at which step during the process could I apply changes to fix or workaround the problem.

标签: python-3.xpickleamazon-sagemaker

解决方案


The issue rooted in the model version used for the xgboost framework. from 1.3.0 on the default output changed from pickle to json and the sagemaker documentation does not seem to have been updated accordingly. So if you want to read the model via

    tar.extractall(path=".")

model = pkl.load(open("xgboost-model", "rb"))

as described in the sagemaker docs, you need to import the XGBOOST framework with with a former version, e.g. 1.2.1.


推荐阅读