python - AWS Elastic Inference,Tensorflow 2 - 加速器没有变量支持,退出
问题描述
我正在尝试使用 AWS 的弹性推理机制。我有 Tensorflow 2 模型。当我尝试将 Elastic Inference 与 TensorFlow Serving 或 TensorFlow Keras API 一起使用时,在这两种情况下都会遇到相同的错误Accelerator Doesn't have Variable Support
。
ubuntu@xx-xxx-xx-0-xxx:~/tensorflow-serving-2-0-0-ei-1-5$ ./amazonei_tensorflow_model_server --rest_api_port=8501 --model_name=testm --model_base_path=/home/ubuntu/testm
2021-01-26 19:37:28.781844: I tensorflow_serving/model_servers/server.cc:85] Building single TensorFlow model file config: model_name: testm model_base_path: /home/ubuntu/testm
2021-01-26 19:37:28.782019: I tensorflow_serving/model_servers/server_core.cc:462] Adding/updating models.
2021-01-26 19:37:28.782044: I tensorflow_serving/model_servers/server_core.cc:573] (Re-)adding model: testm
2021-01-26 19:37:28.882449: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: testm version: 1}
2021-01-26 19:37:28.882485: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: testm version: 1}
2021-01-26 19:37:28.882497: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: testm version: 1}
2021-01-26 19:37:28.882524: I external/ei_for_tf/ei_for_tf/util/saved_model_util.h:75] Reading SavedModel from: /home/ubuntu/testm/1
2021-01-26 19:37:28.940140: I external/ei_for_tf/ei_for_tf/util/saved_model_util.h:98] Reading meta graph with tags { serve }
2021-01-26 19:37:29.069412: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /home/ubuntu/testm/1
2021-01-26 19:37:29.103278: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2021-01-26 19:37:29.169323: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F
2021-01-26 19:37:29.358026: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:202] Restoring SavedModel bundle.
Using Amazon Elastic Inference Client Library Version: 1.7.0
Number of Elastic Inference Accelerators Available: 1
Elastic Inference Accelerator ID: eia-926d6d89ee644dc19d0ada17fcbd423d
Elastic Inference Accelerator Type: eia2.medium
Elastic Inference Accelerator Ordinal: 0
[Tue Jan 26 19:37:41 2021, 810560us] [Execution Engine] Error getting application context for [TensorFlow][2]
[Tue Jan 26 19:37:41 2021, 810620us] [Execution Engine][TensorFlow][2] Failed - Last Error:
EI Error Code: [1, 14, 36]
EI Error Description: Service not available. Please wait for sometime. Also, make sure the EI setup is correct. Quick setup instructions - https://aws.amazon.com/blogs/machine-learning/launch-ei-accelerators-in-minutes-wit
h-the-amazon-elastic-inference-setup-tool-for-ec2. Setup validation - https://docs.aws.amazon.com/elastic-inference/latest/developerguide/ei-troubleshooting.html#ei-activation
EI Request ID: -- EI Accelerator ID: eia-926d6d89ee644dc19d0ada17fcbd423d
EI Client Version: 1.7.0
2021-01-26 19:37:41.810700: E external/ei_for_tf/ei_for_tf/graph_optimizer/ei_graph_optimizer.cc:114] Accelerator Doesn't have Variable Support, Exiting
我正在使用这个s3 repo中的amazonei-tensorflow库。
我认为这台机器设置得很好,可以使用弹性推理,因为当我按照此处的建议运行 EISetupValidator 脚本时。它顺利通过。
ubuntu@xx-xxx-xx-x-xxx:~$ python EISetupValidator.py
All the validation checks passed for Amazon EI from this instance - i-0ec6f6b0899198c0b
我没有在互联网上找到有关此错误的其他参考资料。我将不胜感激任何帮助或线索。提前致谢。
解决方案
我从 Aws 支持人员那里得到了答复,如下所示:
如果前面有连接错误,则可以忽略错误“加速器没有变量支持,正在退出”。由于先前的错误表明问题似乎与 EI 设置有关。
有关 EI 错误代码的更多信息: https ://docs.aws.amazon.com/elastic-inference/latest/developerguide/ei-error-codes.html
再次检查说明。我的问题是我没有将正确的安全组附加到为加速器实例创建的VPC 端点。调整后,Elastic Inferece 起作用了!
推荐阅读
- laravel - SQLSTATE[HY000] [2002]php_network_getaddresses:getaddrinfo 失败:不知道这样的主机
- ajax - 使用 ajax 添加到购物车时如何添加 Woocommerce 通知(无需重新加载)?
- excel - 如何在 VBA 编程中获取错误详细信息?
- java - 如何用java创建一个hastebin
- firebase - Flutter Firebase,注销我以外的其他用户
- python - Django 查询集。用一个查询注释不同的字段
- html - 响应式 flexbox 页脚不会填满屏幕
- ios - 分支 io:为通用链接设置自定义域
- selenium - WebDriverException:未知错误:ChromeDriver 80.0.3987.106 和 Chrome 80.0.3987.122 无法发现打开的页面错误
- visual-studio-code - 如何不在 VS Code 侧边栏中对文件夹和文件进行分组,而是按字母顺序排列所有内容?