首页 > 解决方案 > PySpark ALSModel 加载在 Azure ML 服务上部署失败,并出现错误 java.util.NoSuchElementException: Param blockSize 不存在

问题描述

我正在尝试在 Azure ML 服务上部署使用 PySpark 训练的 ALS 模型。我提供了一个 score.py 文件,该文件使用 ALSModel.load() 函数加载经过训练的模型。以下是我的 score.py 文件的代码。

import os
from azureml.core.model import Model
from pyspark.ml.recommendation import ALS, ALSModel
from pyspark.sql.types import StructType, StructField
from pyspark.sql.types import DoubleType, StringType
from pyspark.sql import SQLContext
from pyspark import SparkContext

sc = SparkContext.getOrCreate()
sqlContext = SQLContext(sc)
spark = sqlContext.sparkSession

input_schema = StructType([StructField("UserId", StringType())])
reader = spark.read
reader.schema(input_schema)


def init():
    global model
    # note here "iris.model" is the name of the model registered under the workspace
    # this call should return the path to the model.pkl file on the local disk.
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), "recommendation-model")
    # Load the model file back into a LogisticRegression model
    model = ALSModel.load(model_path)
    

def run(data):
    try:
        input_df = reader.json(sc.parallelize([data]))
        input_df = indexer.transform(input_df)
        
        res = model.recommendForUserSubset(input_df[['UserId_index']], 10)

        # you can return any datatype as long as it is JSON-serializable
        return result.collect()[0]['recommendations']
    except Exception as e:
        traceback.print_exc()
        error = str(e)
        return error

以下是我在 Azure ML 服务中使用 Model.deploy 函数将其部署为 LocalWebService 时遇到的错误

Generating Docker build context.
Package creation Succeeded
Logging into Docker registry viennaglobal.azurecr.io
Logging into Docker registry viennaglobal.azurecr.io
Building Docker image from Dockerfile...
Step 1/5 : FROM viennaglobal.azurecr.io/azureml/azureml_43542b56c5ec3e8d0f68e1556558411f
 ---> 5b3bb174ca5f
Step 2/5 : COPY azureml-app /var/azureml-app
 ---> 8e540c0746f7
Step 3/5 : RUN mkdir -p '/var/azureml-app' && echo eyJhY2NvdW50Q29udGV4dCI6eyJzdWJzY3JpcHRpb25JZCI6IjNkN2M1ZjM4LTI1ODEtNGUxNi05NTdhLWEzOTU1OGI1ZjBiMyIsInJlc291cmNlR3JvdXBOYW1lIjoiZGV2LW9tbmljeC10ZnMtYWkiLCJhY2NvdW50TmFtZSI6ImRldi10ZnMtYWktd29ya3NwYWNlIiwid29ya3NwYWNlSWQiOiI1NjkzNGMzNC1iZmYzLTQ3OWUtODRkMy01OGI4YTc3ZTI4ZjEifSwibW9kZWxzIjp7fSwibW9kZWxzSW5mbyI6e319 | base64 --decode > /var/azureml-app/model_config_map.json
 ---> Running in 502ad8edf91e
 ---> a1bc5e0283d0
Step 4/5 : RUN mv '/var/azureml-app/tmpvxhomyin.py' /var/azureml-app/main.py
 ---> Running in eb4ec1a0b702
 ---> 6a3296fe6420
Step 5/5 : CMD ["runsvdir","/var/runit"]
 ---> Running in 834fd746afef
 ---> 5b9f8be538c0
Successfully built 5b9f8be538c0
Successfully tagged recommend-service:latest
Container (name:musing_borg, id:0f3163692f5119685eee5ed59c8e00aa96cd472f765e7db67653f1a6ce852e83) cannot be killed.
Container has been successfully cleaned up.
Image sha256:0f146f4752878bbbc0e876f4477cc2877ff12a366fca18c986f9a9c2949d028b successfully removed.
Starting Docker container...
Docker container running.
Checking container health...
ERROR - Error: Container has crashed. Did your init method fail?


Container Logs:
/bin/bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by /bin/bash)
2020-07-30T11:57:00,312735664+00:00 - rsyslog/run 
2020-07-30T11:57:00,312768364+00:00 - gunicorn/run 
2020-07-30T11:57:00,313017966+00:00 - iot-server/run 
bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by bash)
2020-07-30T11:57:00,313969073+00:00 - nginx/run 
/usr/sbin/nginx: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...
/bin/bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by /bin/bash)
2020-07-30T11:57:00,597835804+00:00 - iot-server/finish 1 0
2020-07-30T11:57:00,598826211+00:00 - Exit code 1 is normal. Not restarting iot-server.
Starting gunicorn 19.9.0
Listening at: http://127.0.0.1:31311 (10)
Using worker: sync
worker timeout is set to 300
Booting worker with pid: 41
bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by bash)
bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by bash)
Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = jar:file:/home/mmlspark/lib/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.microsoft.ml.spark#mmlspark_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-e07358bb-d354-4f41-aa4c-f0aa73bb0156;1.0
    confs: [default]
    found com.microsoft.ml.spark#mmlspark_2.11;0.15 in spark-list
    found io.spray#spray-json_2.11;1.3.2 in central
    found com.microsoft.cntk#cntk;2.4 in central
    found org.openpnp#opencv;3.2.0-1 in central
    found com.jcraft#jsch;0.1.54 in central
    found org.apache.httpcomponents#httpclient;4.5.6 in central
    found org.apache.httpcomponents#httpcore;4.4.10 in central
    found commons-logging#commons-logging;1.2 in central
    found commons-codec#commons-codec;1.10 in central
    found com.microsoft.ml.lightgbm#lightgbmlib;2.1.250 in central
:: resolution report :: resolve 318ms :: artifacts dl 11ms
    :: modules in use:
    com.jcraft#jsch;0.1.54 from central in [default]
    com.microsoft.cntk#cntk;2.4 from central in [default]
    com.microsoft.ml.lightgbm#lightgbmlib;2.1.250 from central in [default]
    com.microsoft.ml.spark#mmlspark_2.11;0.15 from spark-list in [default]
    commons-codec#commons-codec;1.10 from central in [default]
    commons-logging#commons-logging;1.2 from central in [default]
    io.spray#spray-json_2.11;1.3.2 from central in [default]
    org.apache.httpcomponents#httpclient;4.5.6 from central in [default]
    org.apache.httpcomponents#httpcore;4.4.10 from central in [default]
    org.openpnp#opencv;3.2.0-1 from central in [default]
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   10  |   0   |   0   |   0   ||   10  |   0   |
    ---------------------------------------------------------------------

:: problems summary ::
:::: ERRORS
    unknown resolver repo-1

    unknown resolver repo-1


:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
:: retrieving :: org.apache.spark#spark-submit-parent-e07358bb-d354-4f41-aa4c-f0aa73bb0156
    confs: [default]
    0 artifacts copied, 10 already retrieved (0kB/7ms)
2020-07-30 11:57:02 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Initialized PySpark session.
Initializing logger
2020-07-30 11:57:09,464 | root | INFO | Starting up app insights client
Starting up app insights client
2020-07-30 11:57:09,464 | root | INFO | Starting up request id generator
Starting up request id generator
2020-07-30 11:57:09,464 | root | INFO | Starting up app insight hooks
Starting up app insight hooks
2020-07-30 11:57:09,464 | root | INFO | Invoking user's init function
Invoking user's init function
2020-07-30 11:57:19,652 | root | ERROR | User's init function failed
User's init function failed
2020-07-30 11:57:19,656 | root | ERROR | Encountered Exception Traceback (most recent call last):
  File "/var/azureml-server/aml_blueprint.py", line 163, in register
    main.init()
  File "/var/azureml-app/main.py", line 44, in init
    model = ALSModel.load(model_path)
  File "/home/mmlspark/lib/spark/python/pyspark/ml/util.py", line 362, in load
    return cls.read().load(path)
  File "/home/mmlspark/lib/spark/python/pyspark/ml/util.py", line 300, in load
    java_obj = self._jread.load(path)
  File "/home/mmlspark/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/home/mmlspark/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/home/mmlspark/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o64.load.
: java.util.NoSuchElementException: Param blockSize does not exist.
    at org.apache.spark.ml.param.Params$$anonfun$getParam$2.apply(params.scala:729)
    at org.apache.spark.ml.param.Params$$anonfun$getParam$2.apply(params.scala:729)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.ml.param.Params$class.getParam(params.scala:728)
    at org.apache.spark.ml.PipelineStage.getParam(Pipeline.scala:42)
    at org.apache.spark.ml.util.DefaultParamsReader$Metadata$$anonfun$setParams$1.apply(ReadWrite.scala:591)
    at org.apache.spark.ml.util.DefaultParamsReader$Metadata$$anonfun$setParams$1.apply(ReadWrite.scala:589)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at org.apache.spark.ml.util.DefaultParamsReader$Metadata.setParams(ReadWrite.scala:589)
    at org.apache.spark.ml.util.DefaultParamsReader$Metadata.getAndSetParams(ReadWrite.scala:572)
    at org.apache.spark.ml.recommendation.ALSModel$ALSModelReader.load(ALS.scala:533)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)


Encountered Exception Traceback (most recent call last):
  File "/var/azureml-server/aml_blueprint.py", line 163, in register
    main.init()
  File "/var/azureml-app/main.py", line 44, in init
    model = ALSModel.load(model_path)
  File "/home/mmlspark/lib/spark/python/pyspark/ml/util.py", line 362, in load
    return cls.read().load(path)
  File "/home/mmlspark/lib/spark/python/pyspark/ml/util.py", line 300, in load
    java_obj = self._jread.load(path)
  File "/home/mmlspark/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/home/mmlspark/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/home/mmlspark/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o64.load.
: java.util.NoSuchElementException: Param blockSize does not exist.
    at org.apache.spark.ml.param.Params$$anonfun$getParam$2.apply(params.scala:729)
    at org.apache.spark.ml.param.Params$$anonfun$getParam$2.apply(params.scala:729)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.ml.param.Params$class.getParam(params.scala:728)
    at org.apache.spark.ml.PipelineStage.getParam(Pipeline.scala:42)
    at org.apache.spark.ml.util.DefaultParamsReader$Metadata$$anonfun$setParams$1.apply(ReadWrite.scala:591)
    at org.apache.spark.ml.util.DefaultParamsReader$Metadata$$anonfun$setParams$1.apply(ReadWrite.scala:589)
    at scala.collection.immutable.List.foreach(List.scala:392)
    at org.apache.spark.ml.util.DefaultParamsReader$Metadata.setParams(ReadWrite.scala:589)
    at org.apache.spark.ml.util.DefaultParamsReader$Metadata.getAndSetParams(ReadWrite.scala:572)
    at org.apache.spark.ml.recommendation.ALSModel$ALSModelReader.load(ALS.scala:533)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)


Worker exiting (pid: 41)
Shutting down: Master
Reason: Worker failed to boot.
/bin/bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by /bin/bash)
2020-07-30T11:57:19,833136837+00:00 - gunicorn/finish 3 0
2020-07-30T11:57:19,834216245+00:00 - Exit code 3 is not normal. Killing image.

---------------------------------------------------------------------------
WebserviceException                       Traceback (most recent call last)
<ipython-input-43-d0992ae9d1c9> in <module>
      6 local_service = Model.deploy(workspace, "recommend-service", [register_model], inference_config, deployment_config)
      7 
----> 8 local_service.wait_for_deployment()

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/core/webservice/local.py in decorated(self, *args, **kwargs)
     69                 raise WebserviceException('Cannot call {}() when service is {}.'.format(func.__name__, self.state),
     70                                           logger=module_logger)
---> 71             return func(self, *args, **kwargs)
     72         return decorated
     73     return decorator

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/core/webservice/local.py in wait_for_deployment(self, show_output)
    601                                    self._container,
    602                                    health_url=self._internal_base_url,
--> 603                                    cleanup_if_failed=False)
    604 
    605             self.state = LocalWebservice.STATE_RUNNING

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/_model_management/_util.py in container_health_check(docker_port, container, health_url, cleanup_if_failed)
    745             # The container has started and crashed.
    746             _raise_for_container_failure(container, cleanup_if_failed,
--> 747                                          'Error: Container has crashed. Did your init method fail?')
    748 
    749         # The container hasn't crashed, so try to ping the health endpoint.

/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/_model_management/_util.py in _raise_for_container_failure(container, cleanup, message)
   1258         cleanup_container(container)
   1259 
-> 1260     raise WebserviceException(message, logger=module_logger)
   1261 
   1262 

WebserviceException: WebserviceException:
    Message: Error: Container has crashed. Did your init method fail?
    InnerException None
    ErrorResponse 
{
    "error": {
        "message": "Error: Container has crashed. Did your init method fail?"
    }
}

但是,ALSModel.load() 在 Jupyter 笔记本中执行时可以正常工作。

标签: pythonapache-sparkmachine-learningpysparkazure-machine-learning-service

解决方案


有几件事要检查:

  1. 您的模型是否已在工作区中注册?AZUREML_MODEL_DIR 仅适用于注册模型。有关注册模型的信息,请参阅此链接
  2. 您是否在 InferenceConfig 中指定与本地使用相同版本的 pyspark.ml.recommendation?这种错误可能是由于版本不同
  3. 你看过输出print(service.get_logs())吗?在此处查看我们的故障排除和调试文档,了解您可以尝试的其他事情

推荐阅读