python - Docker tensorflow 服务 + 谷歌云存储大约需要 20 分钟才能启动
问题描述
我正在使用 Tensorflow Serving 来提供一个模型来预测图像的类别。
在我的 Dockerfile 下面:
FROM tensorflow/serving:2.2.0
# Copy files
COPY container-files /
# Only root can access the google-cloud-storage-private-key access.
RUN chown root:root /etc/gs && \
chmod 640 /etc/gs
ENV GOOGLE_APPLICATION_CREDENTIALS=/etc/gs/credential.json
# Set where models should be stored in the container
ENV MODEL_BASE_PATH=/models
# The only required piece is the model name in order to differentiate endpoints
ENV MODEL_NAME=inception
# Create models dir
RUN mkdir -p ${MODEL_BASE_PATH}/${MODEL_NAME}
# Create a script that runs the model server so we can use environment variables
# while also passing in arguments from the docker command line
RUN echo '#!/bin/bash \n\n\
tensorflow_model_server --port=8500 --rest_api_port=8080 \
--model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} \
"$@"' > /usr/bin/tf_serving_entrypoint.sh \
&& chmod +x /usr/bin/tf_serving_entrypoint.sh
# Expose ports
# gRPC
# EXPOSE 8500
# REST
EXPOSE 8080
# Remove entrypoint from parent image
ENTRYPOINT []
CMD ["/usr/bin/tf_serving_entrypoint.sh"]
上面的 Dockerfile 将构建一个包含模型的图像。此图像只需几秒钟即可启动。
$ docker run -it -p 8080:8080 tfs-model:001-dev
2020-07-22 17:05:04.322982: I tensorflow_serving/model_servers/server.cc:86] Building single TensorFlow model file config: model_name: inception model_base_path: /models/inception
2020-07-22 17:05:04.323206: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
2020-07-22 17:05:04.323220: I tensorflow_serving/model_servers/server_core.cc:575] (Re-)adding model: inception
2020-07-22 17:05:04.424089: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: inception version: 1}
2020-07-22 17:05:04.424164: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: inception version: 1}
2020-07-22 17:05:04.424196: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: inception version: 1}
2020-07-22 17:05:04.424295: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /models/inception/1
2020-07-22 17:05:04.594622: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2020-07-22 17:05:04.594652: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:295] Reading SavedModel debug info (if present) from: /models/inception/1
2020-07-22 17:05:04.594801: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-07-22 17:05:05.006016: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:234] Restoring SavedModel bundle.
2020-07-22 17:05:06.465954: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:183] Running initialization op on SavedModel bundle at path: /models/inception/1
2020-07-22 17:05:06.952038: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:364] SavedModel load for tags { serve }; Status: success: OK. Took 2527751 microseconds.
2020-07-22 17:05:07.040463: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:105] No warmup data file found at /models/inception/1/assets.extra/tf_serving_warmup_requests
2020-07-22 17:05:07.042644: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: inception version: 1}
2020-07-22 17:05:07.044539: I tensorflow_serving/model_servers/server.cc:355] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2020-07-22 17:05:07.045586: I tensorflow_serving/model_servers/server.cc:375] Exporting HTTP/REST API at:localhost:8080 ...
[evhttp_server.cc : 238] NET_LOG: Entering the event loop ...
但我需要将模型放在 Google Cloud Storage 存储桶中。
然后在上面的 Dockerfile 中,我替换了ENV MODEL_BASE_PATH=/models
byENV MODEL_BASE_PATH=gs://my_bucket/model
以允许 TF 服务从谷歌存储中获取模型。
现在,我也可以运行映像了,但现在映像需要大约 20 分钟才能启动:
$ docker run -it -p 8080:8080 tfs-model:001-dev
2020-07-22 16:06:42.998266: I tensorflow_serving/model_servers/server.cc:86] Building single TensorFlow model file config: model_name: inception model_base_path: gs://my_bucket/model/inception
2020-07-22 16:06:42.998465: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
2020-07-22 16:06:42.998477: I tensorflow_serving/model_servers/server_core.cc:575] (Re-)adding model: inception
2020-07-22 16:06:51.742294: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: inception version: 1}
2020-07-22 16:06:51.742359: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: inception version: 1}
2020-07-22 16:06:51.742396: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: inception version: 1}
2020-07-22 16:06:53.834245: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: gs://my_bucket/model/inception/1
2020-07-22 16:06:55.691746: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2020-07-22 16:06:55.691830: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:295] Reading SavedModel debug info (if present) from: gs://my_bucket/model/inception/1
2020-07-22 16:06:56.379065: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-07-22 16:06:56.834227: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:234] Restoring SavedModel bundle.
2020-07-22 16:23:30.737994: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:183] Running initialization op on SavedModel bundle at path: gs://my_bucket/model/inception/1
2020-07-22 16:23:31.237985: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:364] SavedModel load for tags { serve }; Status: success: OK. Took 997403750 microseconds.
2020-07-22 16:23:32.011964: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:105] No warmup data file found at gs://my_bucket/model/inception/1/assets.extra/tf_serving_warmup_requests
2020-07-22 16:23:38.855881: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: inception version: 1}
2020-07-22 16:23:38.862442: I tensorflow_serving/model_servers/server.cc:355] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
[evhttp_server.cc : 238] NET_LOG: Entering the event loop ...
2020-07-22 16:23:38.865961: I tensorflow_serving/model_servers/server.cc:375] Exporting HTTP/REST API at:localhost:8080 ...
模型文件
-rw-rw-r-- 1 kleysonr kleysonr 9695088 Jul 22 10:57 models/inception/1/saved_model.pb
-rw-rw-r-- 1 kleysonr kleysonr 214767 Jul 22 10:59 models/inception/1/variables/variables.data-00000-of-00002
-rw-rw-r-- 1 kleysonr kleysonr 261431716 Jul 22 10:59 models/inception/1/variables/variables.data-00001-of-00002
-rw-rw-r-- 1 kleysonr kleysonr 51348 Jul 22 10:59 models/inception/1/variables/variables.index
你们中的一些人可能会说这可能是一些本地网络问题和/或将文件下载到我的本地计算机的时间。但是我使用相同的 docker 映像创建了一个新的 Google Cloud Run 实例,但我遇到了同样的问题。
为什么使用谷歌存储服务模型时 TF 服务需要这么长时间才能启动?如何让它快速启动?
解决方案
推荐阅读
- python - 如何在 ffprobe 中省略 JSON 输出中的错误消息?
- vaadin - 如何在 LitElement 的 vaadin 网格中延迟加载项目
- python - Python OCR,旋转图像以水平对齐(直线)
- python - 具有分类数据的 sklearn 树的混淆矩阵
- javascript - 未找到模块,您的意思是“*js”吗?
- elasticsearch - 结果集的 Elasticsearch 不敏感搜索重音
- javascript - 无法从 PHP 访问 JavaScript var
- javascript - 无法为新版本的three.js重写动画
- android - 在 Android 应用中按住按钮时出现小文本
- php - 如何在 Laravel 控制器中回显数组的对象名称