首页 > 解决方案 > Docker tensorflow 服务 + 谷歌云存储大约需要 20 分钟才能启动

问题描述

我正在使用 Tensorflow Serving 来提供一个模型来预测图像的类别。

在我的 Dockerfile 下面:

FROM tensorflow/serving:2.2.0

# Copy files
COPY container-files /

# Only root can access the google-cloud-storage-private-key access.
RUN chown root:root /etc/gs && \
    chmod 640 /etc/gs

ENV GOOGLE_APPLICATION_CREDENTIALS=/etc/gs/credential.json

# Set where models should be stored in the container
ENV MODEL_BASE_PATH=/models

# The only required piece is the model name in order to differentiate endpoints
ENV MODEL_NAME=inception

# Create models dir
RUN mkdir -p ${MODEL_BASE_PATH}/${MODEL_NAME}

# Create a script that runs the model server so we can use environment variables
# while also passing in arguments from the docker command line
RUN echo '#!/bin/bash \n\n\
tensorflow_model_server --port=8500 --rest_api_port=8080 \
--model_name=${MODEL_NAME} --model_base_path=${MODEL_BASE_PATH}/${MODEL_NAME} \
"$@"' > /usr/bin/tf_serving_entrypoint.sh \
&& chmod +x /usr/bin/tf_serving_entrypoint.sh

# Expose ports
# gRPC
# EXPOSE 8500

# REST
EXPOSE 8080

# Remove entrypoint from parent image
ENTRYPOINT []

CMD ["/usr/bin/tf_serving_entrypoint.sh"]

上面的 Dockerfile 将构建一个包含模型的图像。此图像只需几秒钟即可启动。

$ docker run -it -p 8080:8080 tfs-model:001-dev
2020-07-22 17:05:04.322982: I tensorflow_serving/model_servers/server.cc:86] Building single TensorFlow model file config:  model_name: inception model_base_path: /models/inception
2020-07-22 17:05:04.323206: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
2020-07-22 17:05:04.323220: I tensorflow_serving/model_servers/server_core.cc:575]  (Re-)adding model: inception
2020-07-22 17:05:04.424089: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: inception version: 1}
2020-07-22 17:05:04.424164: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: inception version: 1}
2020-07-22 17:05:04.424196: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: inception version: 1}
2020-07-22 17:05:04.424295: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /models/inception/1
2020-07-22 17:05:04.594622: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2020-07-22 17:05:04.594652: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:295] Reading SavedModel debug info (if present) from: /models/inception/1
2020-07-22 17:05:04.594801: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-07-22 17:05:05.006016: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:234] Restoring SavedModel bundle.
2020-07-22 17:05:06.465954: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:183] Running initialization op on SavedModel bundle at path: /models/inception/1
2020-07-22 17:05:06.952038: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:364] SavedModel load for tags { serve }; Status: success: OK. Took 2527751 microseconds.
2020-07-22 17:05:07.040463: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:105] No warmup data file found at /models/inception/1/assets.extra/tf_serving_warmup_requests
2020-07-22 17:05:07.042644: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: inception version: 1}
2020-07-22 17:05:07.044539: I tensorflow_serving/model_servers/server.cc:355] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2020-07-22 17:05:07.045586: I tensorflow_serving/model_servers/server.cc:375] Exporting HTTP/REST API at:localhost:8080 ...
[evhttp_server.cc : 238] NET_LOG: Entering the event loop ...

但我需要将模型放在 Google Cloud Storage 存储桶中。

然后在上面的 Dockerfile 中,我替换了ENV MODEL_BASE_PATH=/modelsbyENV MODEL_BASE_PATH=gs://my_bucket/model以允许 TF 服务从谷歌存储中获取模型。

现在,我也可以运行映像了,但现在映像需要大约 20 分钟才能启动:

$ docker run -it -p 8080:8080 tfs-model:001-dev
2020-07-22 16:06:42.998266: I tensorflow_serving/model_servers/server.cc:86] Building single TensorFlow model file config:  model_name: inception model_base_path: gs://my_bucket/model/inception
2020-07-22 16:06:42.998465: I tensorflow_serving/model_servers/server_core.cc:464] Adding/updating models.
2020-07-22 16:06:42.998477: I tensorflow_serving/model_servers/server_core.cc:575]  (Re-)adding model: inception
2020-07-22 16:06:51.742294: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: inception version: 1}
2020-07-22 16:06:51.742359: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: inception version: 1}
2020-07-22 16:06:51.742396: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: inception version: 1}
2020-07-22 16:06:53.834245: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: gs://my_bucket/model/inception/1
2020-07-22 16:06:55.691746: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2020-07-22 16:06:55.691830: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:295] Reading SavedModel debug info (if present) from: gs://my_bucket/model/inception/1
2020-07-22 16:06:56.379065: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-07-22 16:06:56.834227: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:234] Restoring SavedModel bundle.
2020-07-22 16:23:30.737994: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:183] Running initialization op on SavedModel bundle at path: gs://my_bucket/model/inception/1
2020-07-22 16:23:31.237985: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:364] SavedModel load for tags { serve }; Status: success: OK. Took 997403750 microseconds.
2020-07-22 16:23:32.011964: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:105] No warmup data file found at gs://my_bucket/model/inception/1/assets.extra/tf_serving_warmup_requests
2020-07-22 16:23:38.855881: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: inception version: 1}
2020-07-22 16:23:38.862442: I tensorflow_serving/model_servers/server.cc:355] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
[evhttp_server.cc : 238] NET_LOG: Entering the event loop ...
2020-07-22 16:23:38.865961: I tensorflow_serving/model_servers/server.cc:375] Exporting HTTP/REST API at:localhost:8080 ...

模型文件

-rw-rw-r-- 1 kleysonr kleysonr   9695088 Jul 22 10:57 models/inception/1/saved_model.pb
-rw-rw-r-- 1 kleysonr kleysonr    214767 Jul 22 10:59 models/inception/1/variables/variables.data-00000-of-00002
-rw-rw-r-- 1 kleysonr kleysonr 261431716 Jul 22 10:59 models/inception/1/variables/variables.data-00001-of-00002
-rw-rw-r-- 1 kleysonr kleysonr     51348 Jul 22 10:59 models/inception/1/variables/variables.index

你们中的一些人可能会说这可能是一些本地网络问题和/或将文件下载到我的本地计算机的时间。但是我使用相同的 docker 映像创建了一个新的 Google Cloud Run 实例,但我遇到了同样的问题。

为什么使用谷歌存储服务模型时 TF 服务需要这么长时间才能启动?如何让它快速启动?

标签: pythondockertensorflowtensorflow2.0tensorflow-serving

解决方案


推荐阅读