首页 > 解决方案 > AWS Elastic Inference,Tensorflow 2 - 加速器没有变量支持,退出

问题描述

我正在尝试使用 AWS 的弹性推理机制。我有 Tensorflow 2 模型。当我尝试将 Elastic Inference 与 TensorFlow Serving 或 TensorFlow Keras API 一起使用时,在两种情况下都会遇到相同的错误Accelerator Doesn't have Variable Support

ubuntu@xx-xxx-xx-0-xxx:~/tensorflow-serving-2-0-0-ei-1-5$ ./amazonei_tensorflow_model_server --rest_api_port=8501 --model_name=testm --model_base_path=/home/ubuntu/testm
2021-01-26 19:37:28.781844: I tensorflow_serving/model_servers/server.cc:85] Building single TensorFlow model file config:  model_name: testm model_base_path: /home/ubuntu/testm
2021-01-26 19:37:28.782019: I tensorflow_serving/model_servers/server_core.cc:462] Adding/updating models.                                                                                                  
2021-01-26 19:37:28.782044: I tensorflow_serving/model_servers/server_core.cc:573]  (Re-)adding model: testm                                                                                               
2021-01-26 19:37:28.882449: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: testm version: 1}
2021-01-26 19:37:28.882485: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: testm version: 1}
2021-01-26 19:37:28.882497: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: testm version: 1}
2021-01-26 19:37:28.882524: I external/ei_for_tf/ei_for_tf/util/saved_model_util.h:75] Reading SavedModel from: /home/ubuntu/testm/1
2021-01-26 19:37:28.940140: I external/ei_for_tf/ei_for_tf/util/saved_model_util.h:98] Reading meta graph with tags { serve }
2021-01-26 19:37:29.069412: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /home/ubuntu/testm/1
2021-01-26 19:37:29.103278: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2021-01-26 19:37:29.169323: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F
2021-01-26 19:37:29.358026: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:202] Restoring SavedModel bundle.
Using Amazon Elastic Inference Client Library Version: 1.7.0                       
Number of Elastic Inference Accelerators Available: 1                
Elastic Inference Accelerator ID: eia-926d6d89ee644dc19d0ada17fcbd423d                                                                                               
Elastic Inference Accelerator Type: eia2.medium                                                                                                              
Elastic Inference Accelerator Ordinal: 0                                                                                                                         
                                                                
[Tue Jan 26 19:37:41 2021, 810560us] [Execution Engine] Error getting application context for [TensorFlow][2]
[Tue Jan 26 19:37:41 2021, 810620us] [Execution Engine][TensorFlow][2] Failed - Last Error:
    EI Error Code: [1, 14, 36]                                     
    EI Error Description: Service not available. Please wait for sometime. Also, make sure the EI setup is correct. Quick setup instructions - https://aws.amazon.com/blogs/machine-learning/launch-ei-accelerators-in-minutes-wit
h-the-amazon-elastic-inference-setup-tool-for-ec2. Setup validation - https://docs.aws.amazon.com/elastic-inference/latest/developerguide/ei-troubleshooting.html#ei-activation
    EI Request ID:   --  EI Accelerator ID: eia-926d6d89ee644dc19d0ada17fcbd423d                                                                                     
    EI Client Version: 1.7.0                                                                                                                                 
2021-01-26 19:37:41.810700: E external/ei_for_tf/ei_for_tf/graph_optimizer/ei_graph_optimizer.cc:114] Accelerator Doesn't have Variable Support, Exiting   

我正在使用这个s3 repo中的amazonei-tensorflow库。

我认为这台机器设置得很好,可以使用弹性推理,因为当我按照此处的建议运行 EISetupValidator 脚本时。它顺利通过。

ubuntu@xx-xxx-xx-x-xxx:~$ python EISetupValidator.py 
All the validation checks passed for Amazon EI from this instance - i-0ec6f6b0899198c0b

我没有在互联网上找到有关此错误的其他参考资料。我将不胜感激任何帮助或线索。提前致谢。

标签: pythonamazon-web-servicestensorflow2.0tensorflow-serving

解决方案


我从 Aws 支持人员那里得到了答复,如下所示:

如果前面有连接错误,则可以忽略错误“加速器没有变量支持,正在退出”。由于先前的错误表明问题似乎与 EI 设置有关。

有关 EI 错误代码的更多信息: https ://docs.aws.amazon.com/elastic-inference/latest/developerguide/ei-error-codes.html

再次检查说明。我的问题是我没有将正确的安全组附加到为加速器实例创建的VPC 端点。调整后,Elastic Inferece 起作用了!


推荐阅读