python - PySpark ALSModel 加载在 Azure ML 服务上部署失败,并出现错误 java.util.NoSuchElementException: Param blockSize 不存在
问题描述
我正在尝试在 Azure ML 服务上部署使用 PySpark 训练的 ALS 模型。我提供了一个 score.py 文件,该文件使用 ALSModel.load() 函数加载经过训练的模型。以下是我的 score.py 文件的代码。
import os
from azureml.core.model import Model
from pyspark.ml.recommendation import ALS, ALSModel
from pyspark.sql.types import StructType, StructField
from pyspark.sql.types import DoubleType, StringType
from pyspark.sql import SQLContext
from pyspark import SparkContext
sc = SparkContext.getOrCreate()
sqlContext = SQLContext(sc)
spark = sqlContext.sparkSession
input_schema = StructType([StructField("UserId", StringType())])
reader = spark.read
reader.schema(input_schema)
def init():
global model
# note here "iris.model" is the name of the model registered under the workspace
# this call should return the path to the model.pkl file on the local disk.
model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), "recommendation-model")
# Load the model file back into a LogisticRegression model
model = ALSModel.load(model_path)
def run(data):
try:
input_df = reader.json(sc.parallelize([data]))
input_df = indexer.transform(input_df)
res = model.recommendForUserSubset(input_df[['UserId_index']], 10)
# you can return any datatype as long as it is JSON-serializable
return result.collect()[0]['recommendations']
except Exception as e:
traceback.print_exc()
error = str(e)
return error
以下是我在 Azure ML 服务中使用 Model.deploy 函数将其部署为 LocalWebService 时遇到的错误
Generating Docker build context.
Package creation Succeeded
Logging into Docker registry viennaglobal.azurecr.io
Logging into Docker registry viennaglobal.azurecr.io
Building Docker image from Dockerfile...
Step 1/5 : FROM viennaglobal.azurecr.io/azureml/azureml_43542b56c5ec3e8d0f68e1556558411f
---> 5b3bb174ca5f
Step 2/5 : COPY azureml-app /var/azureml-app
---> 8e540c0746f7
Step 3/5 : RUN mkdir -p '/var/azureml-app' && echo eyJhY2NvdW50Q29udGV4dCI6eyJzdWJzY3JpcHRpb25JZCI6IjNkN2M1ZjM4LTI1ODEtNGUxNi05NTdhLWEzOTU1OGI1ZjBiMyIsInJlc291cmNlR3JvdXBOYW1lIjoiZGV2LW9tbmljeC10ZnMtYWkiLCJhY2NvdW50TmFtZSI6ImRldi10ZnMtYWktd29ya3NwYWNlIiwid29ya3NwYWNlSWQiOiI1NjkzNGMzNC1iZmYzLTQ3OWUtODRkMy01OGI4YTc3ZTI4ZjEifSwibW9kZWxzIjp7fSwibW9kZWxzSW5mbyI6e319 | base64 --decode > /var/azureml-app/model_config_map.json
---> Running in 502ad8edf91e
---> a1bc5e0283d0
Step 4/5 : RUN mv '/var/azureml-app/tmpvxhomyin.py' /var/azureml-app/main.py
---> Running in eb4ec1a0b702
---> 6a3296fe6420
Step 5/5 : CMD ["runsvdir","/var/runit"]
---> Running in 834fd746afef
---> 5b9f8be538c0
Successfully built 5b9f8be538c0
Successfully tagged recommend-service:latest
Container (name:musing_borg, id:0f3163692f5119685eee5ed59c8e00aa96cd472f765e7db67653f1a6ce852e83) cannot be killed.
Container has been successfully cleaned up.
Image sha256:0f146f4752878bbbc0e876f4477cc2877ff12a366fca18c986f9a9c2949d028b successfully removed.
Starting Docker container...
Docker container running.
Checking container health...
ERROR - Error: Container has crashed. Did your init method fail?
Container Logs:
/bin/bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by /bin/bash)
/bin/bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by /bin/bash)
2020-07-30T11:57:00,312735664+00:00 - rsyslog/run
2020-07-30T11:57:00,312768364+00:00 - gunicorn/run
2020-07-30T11:57:00,313017966+00:00 - iot-server/run
bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by bash)
2020-07-30T11:57:00,313969073+00:00 - nginx/run
/usr/sbin/nginx: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...
/bin/bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by /bin/bash)
2020-07-30T11:57:00,597835804+00:00 - iot-server/finish 1 0
2020-07-30T11:57:00,598826211+00:00 - Exit code 1 is normal. Not restarting iot-server.
Starting gunicorn 19.9.0
Listening at: http://127.0.0.1:31311 (10)
Using worker: sync
worker timeout is set to 300
Booting worker with pid: 41
bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by bash)
bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by bash)
Ivy Default Cache set to: /root/.ivy2/cache
The jars for the packages stored in: /root/.ivy2/jars
:: loading settings :: url = jar:file:/home/mmlspark/lib/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.microsoft.ml.spark#mmlspark_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-e07358bb-d354-4f41-aa4c-f0aa73bb0156;1.0
confs: [default]
found com.microsoft.ml.spark#mmlspark_2.11;0.15 in spark-list
found io.spray#spray-json_2.11;1.3.2 in central
found com.microsoft.cntk#cntk;2.4 in central
found org.openpnp#opencv;3.2.0-1 in central
found com.jcraft#jsch;0.1.54 in central
found org.apache.httpcomponents#httpclient;4.5.6 in central
found org.apache.httpcomponents#httpcore;4.4.10 in central
found commons-logging#commons-logging;1.2 in central
found commons-codec#commons-codec;1.10 in central
found com.microsoft.ml.lightgbm#lightgbmlib;2.1.250 in central
:: resolution report :: resolve 318ms :: artifacts dl 11ms
:: modules in use:
com.jcraft#jsch;0.1.54 from central in [default]
com.microsoft.cntk#cntk;2.4 from central in [default]
com.microsoft.ml.lightgbm#lightgbmlib;2.1.250 from central in [default]
com.microsoft.ml.spark#mmlspark_2.11;0.15 from spark-list in [default]
commons-codec#commons-codec;1.10 from central in [default]
commons-logging#commons-logging;1.2 from central in [default]
io.spray#spray-json_2.11;1.3.2 from central in [default]
org.apache.httpcomponents#httpclient;4.5.6 from central in [default]
org.apache.httpcomponents#httpcore;4.4.10 from central in [default]
org.openpnp#opencv;3.2.0-1 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 10 | 0 | 0 | 0 || 10 | 0 |
---------------------------------------------------------------------
:: problems summary ::
:::: ERRORS
unknown resolver repo-1
unknown resolver repo-1
:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
:: retrieving :: org.apache.spark#spark-submit-parent-e07358bb-d354-4f41-aa4c-f0aa73bb0156
confs: [default]
0 artifacts copied, 10 already retrieved (0kB/7ms)
2020-07-30 11:57:02 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Initialized PySpark session.
Initializing logger
2020-07-30 11:57:09,464 | root | INFO | Starting up app insights client
Starting up app insights client
2020-07-30 11:57:09,464 | root | INFO | Starting up request id generator
Starting up request id generator
2020-07-30 11:57:09,464 | root | INFO | Starting up app insight hooks
Starting up app insight hooks
2020-07-30 11:57:09,464 | root | INFO | Invoking user's init function
Invoking user's init function
2020-07-30 11:57:19,652 | root | ERROR | User's init function failed
User's init function failed
2020-07-30 11:57:19,656 | root | ERROR | Encountered Exception Traceback (most recent call last):
File "/var/azureml-server/aml_blueprint.py", line 163, in register
main.init()
File "/var/azureml-app/main.py", line 44, in init
model = ALSModel.load(model_path)
File "/home/mmlspark/lib/spark/python/pyspark/ml/util.py", line 362, in load
return cls.read().load(path)
File "/home/mmlspark/lib/spark/python/pyspark/ml/util.py", line 300, in load
java_obj = self._jread.load(path)
File "/home/mmlspark/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/home/mmlspark/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/home/mmlspark/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o64.load.
: java.util.NoSuchElementException: Param blockSize does not exist.
at org.apache.spark.ml.param.Params$$anonfun$getParam$2.apply(params.scala:729)
at org.apache.spark.ml.param.Params$$anonfun$getParam$2.apply(params.scala:729)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.ml.param.Params$class.getParam(params.scala:728)
at org.apache.spark.ml.PipelineStage.getParam(Pipeline.scala:42)
at org.apache.spark.ml.util.DefaultParamsReader$Metadata$$anonfun$setParams$1.apply(ReadWrite.scala:591)
at org.apache.spark.ml.util.DefaultParamsReader$Metadata$$anonfun$setParams$1.apply(ReadWrite.scala:589)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.ml.util.DefaultParamsReader$Metadata.setParams(ReadWrite.scala:589)
at org.apache.spark.ml.util.DefaultParamsReader$Metadata.getAndSetParams(ReadWrite.scala:572)
at org.apache.spark.ml.recommendation.ALSModel$ALSModelReader.load(ALS.scala:533)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Encountered Exception Traceback (most recent call last):
File "/var/azureml-server/aml_blueprint.py", line 163, in register
main.init()
File "/var/azureml-app/main.py", line 44, in init
model = ALSModel.load(model_path)
File "/home/mmlspark/lib/spark/python/pyspark/ml/util.py", line 362, in load
return cls.read().load(path)
File "/home/mmlspark/lib/spark/python/pyspark/ml/util.py", line 300, in load
java_obj = self._jread.load(path)
File "/home/mmlspark/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/home/mmlspark/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/home/mmlspark/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o64.load.
: java.util.NoSuchElementException: Param blockSize does not exist.
at org.apache.spark.ml.param.Params$$anonfun$getParam$2.apply(params.scala:729)
at org.apache.spark.ml.param.Params$$anonfun$getParam$2.apply(params.scala:729)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.ml.param.Params$class.getParam(params.scala:728)
at org.apache.spark.ml.PipelineStage.getParam(Pipeline.scala:42)
at org.apache.spark.ml.util.DefaultParamsReader$Metadata$$anonfun$setParams$1.apply(ReadWrite.scala:591)
at org.apache.spark.ml.util.DefaultParamsReader$Metadata$$anonfun$setParams$1.apply(ReadWrite.scala:589)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.ml.util.DefaultParamsReader$Metadata.setParams(ReadWrite.scala:589)
at org.apache.spark.ml.util.DefaultParamsReader$Metadata.getAndSetParams(ReadWrite.scala:572)
at org.apache.spark.ml.recommendation.ALSModel$ALSModelReader.load(ALS.scala:533)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Worker exiting (pid: 41)
Shutting down: Master
Reason: Worker failed to boot.
/bin/bash: /azureml-envs/azureml_7fbe163ce1d4208cd897650a64b7a54d/lib/libtinfo.so.5: no version information available (required by /bin/bash)
2020-07-30T11:57:19,833136837+00:00 - gunicorn/finish 3 0
2020-07-30T11:57:19,834216245+00:00 - Exit code 3 is not normal. Killing image.
---------------------------------------------------------------------------
WebserviceException Traceback (most recent call last)
<ipython-input-43-d0992ae9d1c9> in <module>
6 local_service = Model.deploy(workspace, "recommend-service", [register_model], inference_config, deployment_config)
7
----> 8 local_service.wait_for_deployment()
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/core/webservice/local.py in decorated(self, *args, **kwargs)
69 raise WebserviceException('Cannot call {}() when service is {}.'.format(func.__name__, self.state),
70 logger=module_logger)
---> 71 return func(self, *args, **kwargs)
72 return decorated
73 return decorator
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/core/webservice/local.py in wait_for_deployment(self, show_output)
601 self._container,
602 health_url=self._internal_base_url,
--> 603 cleanup_if_failed=False)
604
605 self.state = LocalWebservice.STATE_RUNNING
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/_model_management/_util.py in container_health_check(docker_port, container, health_url, cleanup_if_failed)
745 # The container has started and crashed.
746 _raise_for_container_failure(container, cleanup_if_failed,
--> 747 'Error: Container has crashed. Did your init method fail?')
748
749 # The container hasn't crashed, so try to ping the health endpoint.
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/_model_management/_util.py in _raise_for_container_failure(container, cleanup, message)
1258 cleanup_container(container)
1259
-> 1260 raise WebserviceException(message, logger=module_logger)
1261
1262
WebserviceException: WebserviceException:
Message: Error: Container has crashed. Did your init method fail?
InnerException None
ErrorResponse
{
"error": {
"message": "Error: Container has crashed. Did your init method fail?"
}
}
但是,ALSModel.load() 在 Jupyter 笔记本中执行时可以正常工作。
解决方案
有几件事要检查:
- 您的模型是否已在工作区中注册?AZUREML_MODEL_DIR 仅适用于注册模型。有关注册模型的信息,请参阅此链接
- 您是否在 InferenceConfig 中指定与本地使用相同版本的 pyspark.ml.recommendation?这种错误可能是由于版本不同
- 你看过输出
print(service.get_logs())
吗?在此处查看我们的故障排除和调试文档,了解您可以尝试的其他事情
推荐阅读
- vue.js - 从 api 加载路由并动态导入视图
- javascript - 为什么更新的道具没有反映在 .then 函数中?
- regex - 如何从字符串中提取 x-atmosphere-tracking-id?
- java - 将我的 Google 地图从 FragmentActivity 类型更改为 BottomNavigationView 的简单片段?
- php - 如何使用 Laravel eloquent 查询根据第二个表的字段值的 groupby 从三个表中检索所有数据?
- mysql - 如何?汇总用户选择的数据库设计
- python - CUDA 运行时未知错误,可能是驱动程序问题?CUDA 看不到我的 gpu
- powershell - 需要修剪双扩展名文件的扩展名
- sql - 有没有办法从 xmlvalue.value 方法中选择所有元素而不是指定 [1]
- ios - 将 AdBanner 全局添加到所有屏幕