python-3.x - 如何在 AWS Elastic Beanstalk 上设置 Ray(分布式编程框架)和 Flask 应用程序?
问题描述
背景
我一直在尝试将 Ray ( https://github.com/ray-project/ray ) 实现到我们 AI API 的生产版本中,但没有成功。本质上,我们想用它来加速我们的一种聚类算法,以减少大量集群发生的延迟。我们的 API 是使用 Python3.6、Flask 和 Numpy 编写的。我们使用 Elastic Beanstalk 和 bitbucket 管道使持续开发相对容易。但是,当我们最近尝试合并 Ray 时,我们不断遇到一系列错误。有些是来自 EB 的构建错误,我们通过删除修复了这些错误enum34
(我们以前从未做过的事情),其余的似乎是 mod_wsgi 错误(同样,我们以前从未遇到过的事情)。以下是 CloudWatch 上记录的错误消息片段(此后重复)。我只想知道我做错了什么,如何修复出现的这个错误,以及如何正确部署这个 API。
堆栈跟踪(来自 CloudWatch)
2020-09-15T02:27:57.571-07:00 [Tue Sep 15 09:26:06.609052 2020] [suexec:notice] [pid 3248] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
2020-09-15T02:27:57.571-07:00 [Tue Sep 15 09:26:06.623605 2020] [http2:warn] [pid 3248] AH10034: The mpm module (prefork.c) is not supported by mod_http2. The mpm determines how things are processed in your server. HTTP/2 has more demands in this regard and the currently selected mpm will just not do. This is an advisory warning. Your server will continue to work, but the HTTP/2 protocol will be inactive.
2020-09-15T02:27:57.571-07:00 [Tue Sep 15 09:26:06.623616 2020] [http2:warn] [pid 3248] AH02951: mod_ssl does not seem to be enabled
2020-09-15T02:27:57.571-07:00 [Tue Sep 15 09:26:06.624028 2020] [lbmethod_heartbeat:notice] [pid 3248] AH02282: No slotmem from mod_heartmonitor
2020-09-15T02:27:57.571-07:00 [Tue Sep 15 09:26:06.624068 2020] [:warn] [pid 3248] mod_wsgi: Compiled for Python/3.6.2.
2020-09-15T02:27:57.571-07:00 [Tue Sep 15 09:26:06.624072 2020] [:warn] [pid 3248] mod_wsgi: Runtime using Python/3.6.12.
2020-09-15T02:27:57.571-07:00 [Tue Sep 15 09:26:06.625952 2020] [mpm_prefork:notice] [pid 3248] AH00163: Apache/2.4.46 (Amazon) mod_wsgi/3.5 Python/3.6.12 configured -- resuming normal operations
2020-09-15T02:27:57.571-07:00 [Tue Sep 15 09:26:06.625967 2020] [core:notice] [pid 3248] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
2020-09-15T02:32:51.234-07:00 [Tue Sep 15 09:32:50.298983 2020] [mpm_prefork:notice] [pid 3248] AH00169: caught SIGTERM, shutting down
2020-09-15T02:32:52.234-07:00 [Tue Sep 15 09:32:51.353642 2020] [suexec:notice] [pid 7649] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
2020-09-15T02:32:52.234-07:00 [Tue Sep 15 09:32:51.367142 2020] [so:warn] [pid 7649] AH01574: module wsgi_module is already loaded, skipping
2020-09-15T02:32:52.234-07:00 [Tue Sep 15 09:32:51.368899 2020] [http2:warn] [pid 7649] AH10034: The mpm module (prefork.c) is not supported by mod_http2. The mpm determines how things are processed in your server. HTTP/2 has more demands in this regard and the currently selected mpm will just not do. This is an advisory warning. Your server will continue to work, but the HTTP/2 protocol will be inactive.
2020-09-15T02:32:52.234-07:00 [Tue Sep 15 09:32:51.368908 2020] [http2:warn] [pid 7649] AH02951: mod_ssl does not seem to be enabled
2020-09-15T02:32:52.234-07:00 [Tue Sep 15 09:32:51.369374 2020] [lbmethod_heartbeat:notice] [pid 7649] AH02282: No slotmem from mod_heartmonitor
2020-09-15T02:32:52.234-07:00 [Tue Sep 15 09:32:51.369426 2020] [:warn] [pid 7649] mod_wsgi: Compiled for Python/3.6.2.
2020-09-15T02:32:52.234-07:00 [Tue Sep 15 09:32:51.369431 2020] [:warn] [pid 7649] mod_wsgi: Runtime using Python/3.6.12.
2020-09-15T02:32:52.234-07:00 [Tue Sep 15 09:32:51.377064 2020] [mpm_prefork:notice] [pid 7649] AH00163: Apache/2.4.46 (Amazon) mod_wsgi/3.5 Python/3.6.12 configured -- resuming normal operations
2020-09-15T02:32:52.234-07:00 [Tue Sep 15 09:32:51.377085 2020] [core:notice] [pid 7649] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
2020-09-15T02:32:55.236-07:00 [Tue Sep 15 09:32:54.638019 2020] [:warn] [pid 7659] mod_wsgi (pid=7659): Callback registration for signal 15 ignored.
2020-09-15T02:32:55.236-07:00 [Tue Sep 15 09:32:54.638907 2020] [:warn] [pid 7659] File "/opt/python/current/app/run.py", line 5, in <module>
2020-09-15T02:32:55.236-07:00 [Tue Sep 15 09:32:54.638917 2020] [:warn] [pid 7659] import ray
2020-09-15T02:32:55.236-07:00 [Tue Sep 15 09:32:54.638925 2020] [:warn] [pid 7659] File "<frozen importlib._bootstrap>", line 971, in _find_and_load
2020-09-15T02:32:55.236-07:00 [Tue Sep 15 09:32:54.638931 2020] [:warn] [pid 7659] File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
2020-09-15T02:32:55.236-07:00 [Tue Sep 15 09:32:54.638937 2020] [:warn] [pid 7659] File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
2020-09-15T02:32:55.236-07:00 [Tue Sep 15 09:32:54.638942 2020] [:warn] [pid 7659] File "<frozen importlib._bootstrap_external>", line 678, in exec_module
2020-09-15T02:32:55.236-07:00 [Tue Sep 15 09:32:54.638948 2020] [:warn] [pid 7659] File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
2020-09-15T02:32:55.236-07:00 [Tue Sep 15 09:32:54.638954 2020] [:warn] [pid 7659] File "/opt/python/run/venv/local/lib/python3.6/site-packages/ray/__init__.py", line 81, in <module>
2020-09-15T02:32:55.236-07:00 [Tue Sep 15 09:32:54.638957 2020] [:warn] [pid 7659] from ray.worker import (
2020-09-15T02:32:55.236-07:00 [Tue Sep 15 09:32:54.638962 2020] [:warn] [pid 7659] File "<frozen importlib._bootstrap>", line 971, in _find_and_load
2020-09-15T02:32:55.236-07:00 [Tue Sep 15 09:32:54.638967 2020] [:warn] [pid 7659] File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
2020-09-15T02:32:55.236-07:00 [Tue Sep 15 09:32:54.638973 2020] [:warn] [pid 7659] File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
2020-09-15T02:32:55.236-07:00 [Tue Sep 15 09:32:54.638987 2020] [:warn] [pid 7659] File "<frozen importlib._bootstrap_external>", line 678, in exec_module
2020-09-15T02:32:55.236-07:00 [Tue Sep 15 09:32:54.638992 2020] [:warn] [pid 7659] File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
2020-09-15T02:32:55.236-07:00 [Tue Sep 15 09:32:54.638998 2020] [:warn] [pid 7659] File "/opt/python/run/venv/local/lib/python3.6/site-packages/ray/worker.py", line 873, in <module>
2020-09-15T02:32:55.236-07:00 [Tue Sep 15 09:32:54.639001 2020] [:warn] [pid 7659] ray.utils.set_sigterm_handler(sigterm_handler)
2020-09-15T02:32:55.236-07:00 [Tue Sep 15 09:32:54.639006 2020] [:warn] [pid 7659] File "/opt/python/run/venv/local/lib/python3.6/site-packages/ray/utils.py", line 722, in set_sigterm_handler
2020-09-15T02:32:55.236-07:00 [Tue Sep 15 09:32:54.639010 2020] [:warn] [pid 7659] signal.signal(signal.SIGTERM, sigterm_handler)
2020-09-15T02:32:55.236-07:00 [Tue Sep 15 09:32:55.203215 2020] [:error] [pid 7659] DEBUG:ray.node:Process STDOUT and STDERR is being redirected to /tmp/ray/session_2020-09-15_09-32-55_202688_7659/logs.
2020-09-15T02:32:55.236-07:00 [Tue Sep 15 09:32:55.204027 2020] [:error] [pid 7659] INFO:ray.resource_spec:Starting Ray with 4.44 GiB memory available for workers and up to 2.24 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-09-15T02:32:56.238-07:00 [Tue Sep 15 09:32:55.309054 2020] [:error] [pid 7659] DEBUG:ray.services:Waiting for redis server at 127.0.0.1:6379 to respond...
2020-09-15T02:32:56.238-07:00 [Tue Sep 15 09:32:55.525174 2020] [:error] [pid 7659] DEBUG:ray.services:Waiting for redis server at 127.0.0.1:19594 to respond...
2020-09-15T02:32:56.238-07:00 [Tue Sep 15 09:32:55.526660 2020] [:error] [pid 7659] DEBUG:ray.services:Starting Redis shard with 0.8 GB max memory.
2020-09-15T02:32:56.238-07:00 [Tue Sep 15 09:32:55.663605 2020] [:error] [pid 7659] INFO:ray.services:View the Ray dashboard at \x1b[1m\x1b[32mlocalhost:8265\x1b[39m\x1b[22m
2020-09-15T02:32:56.238-07:00 [Tue Sep 15 09:32:55.665571 2020] [:error] [pid 7659] DEBUG:ray.node:Process STDOUT and STDERR is being redirected to /tmp/ray/session_2020-09-15_09-32-55_202688_7659/logs.
2020-09-15T02:32:56.238-07:00 [Tue Sep 15 09:32:55.666148 2020] [:error] [pid 7659] DEBUG:ray.services:Determine to start the Plasma object store with 2.41 GB memory using /dev/shm.
2020-09-15T02:32:56.238-07:00 [Tue Sep 15 09:32:55.670650 2020] [:error] [pid 7659] DEBUG:ray.services:Determine to start the Plasma object store with 2.41 GB memory using /dev/shm.
2020-09-15T02:32:56.238-07:00 [Tue Sep 15 09:32:55.696353 2020] [:error] [pid 7659] [remote 127.0.0.1:0] mod_wsgi (pid=7659): Target WSGI script '/opt/python/current/app/run.py' cannot be loaded as Python module.
2020-09-15T02:32:56.238-07:00 [Tue Sep 15 09:32:55.696417 2020] [:error] [pid 7659] [remote 127.0.0.1:0] mod_wsgi (pid=7659): Exception occurred processing WSGI script '/opt/python/current/app/run.py'.
2020-09-15T02:32:56.238-07:00 [Tue Sep 15 09:32:55.696594 2020] [:error] [pid 7659] [remote 127.0.0.1:0] Traceback (most recent call last):
2020-09-15T02:32:56.238-07:00 [Tue Sep 15 09:32:55.696630 2020] [:error] [pid 7659] [remote 127.0.0.1:0] File "/opt/python/current/app/run.py", line 24, in <module>
2020-09-15T02:32:56.238-07:00 [Tue Sep 15 09:32:55.696635 2020] [:error] [pid 7659] [remote 127.0.0.1:0] ray.init(configure_logging=False)
2020-09-15T02:32:56.238-07:00 [Tue Sep 15 09:32:55.696643 2020] [:error] [pid 7659] [remote 127.0.0.1:0] File "/opt/python/run/venv/local/lib/python3.6/site-packages/ray/worker.py", line 806, in init
2020-09-15T02:32:56.238-07:00 [Tue Sep 15 09:32:55.696648 2020] [:error] [pid 7659] [remote 127.0.0.1:0] job_id=job_id)
2020-09-15T02:32:56.238-07:00 [Tue Sep 15 09:32:55.696654 2020] [:error] [pid 7659] [remote 127.0.0.1:0] File "/opt/python/run/venv/local/lib/python3.6/site-packages/ray/worker.py", line 1178, in connect
2020-09-15T02:32:56.238-07:00 [Tue Sep 15 09:32:55.696659 2020] [:error] [pid 7659] [remote 127.0.0.1:0] faulthandler.enable(all_threads=False)
2020-09-15T02:32:56.238-07:00 [Tue Sep 15 09:32:55.696677 2020] [:error] [pid 7659] [remote 127.0.0.1:0] AttributeError: 'mod_wsgi.Log' object has no attribute 'fileno
这是我们在 .ebextensions 中的代码:
.ebextensions/packages.config 文件:
packages:
yum:
gcc-c++: []
python36-devel: []
这里的实际错误似乎是AttributeError: 'mod_wsgi.Log' object has no attribute 'fileno'
.
.ebextensions/00_commands.config
commands:
00_setup_pip:
command: sudo python3 -m pip install --upgrade --force pip
01_uninstall_enum34:
command: pip uninstall -y enum34
.ebextensions/00_files.config
files:
"/opt/elasticbeanstalk/hooks/appdeploy/pre/00_uninstall_enum34.sh":
mode: "000755"
owner: root
group: root
content: |
rm -f -r /opt/python/run/venv/lib/python3.6/site-packages/enum && rm -f -r /opt/python/run/venv/lib/python3.6/site-packages/enum34-1.1.10.dist-info
运行.py
对于 Flask 应用程序,我们的运行文件编写如下:
#!/usr/bin/env python3
import atexit
import importlib
import logging
import ray
import threading
import settings
import src.routes as routes
from src import app as application
# Commented out because we can't even get the single machine instance working. Forget clusters.
# if settings.ENVIRONMENT == 'PRODUCTION':
# try:
# ray.init(address='auto')
# except Exception:
# ray.init()
# else:
# try:
# ray.init()
# except Exception as error:
# raise error
ray.init(configure_logging=False)
# Import and register routes with Flask application
# importlib.import_module('.routes', 'src')
if settings.REPEAT:
log = logging.getLogger('werkzeug')
log.setLevel(logging.ERROR)
if __name__ == '__main__':
# If this service is designated as the repeat service this function will be called every hour.
application.run('0.0.0.0', port=settings.PORT, debug=True, threaded=settings.THREADED)
我们过去也必须container_commands
启动 Ray 集群,但后来我们删除了它,以便解决我们遇到的单实例 Ray 问题。
bitbucket-pipeline.yaml
image: python:3.7.3
pipelines:
branches:
master:
- step:
image: atlassian/default-image:2
name: 'Build and Test'
script:
- zip -r careerfair-cluster-service.zip . -x '*.git*'
# Define an artifact to pass the zip file to the next step
artifacts:
- careerfair-cluster-service.zip
- step:
image: python:3.7.3
name: 'Deploy code to elasticbeanstalk'
caches:
- pip
script: # Modify the commands below to build your repository.
- pip install -U setuptools
- pip install -U wheel
- pip install -U flask-cors
- pip install mod_wsgi-httpd
- pip install -r requirements.txt
- pipe: atlassian/aws-elasticbeanstalk-deploy:0.5.5
variables:
AWS_ACCESS_KEY_ID: '$AWS_ACCESS_KEY_ID'
AWS_SECRET_ACCESS_KEY: '$AWS_SECRET_ACCESS_KEY'
AWS_DEFAULT_REGION: 'us-west-1'
APPLICATION_NAME: 'careerfair-cluster-service'
ENVIRONMENT_NAME: 'ClusterServiceProduction'
ZIP_FILE: 'careerfair-cluster-service.zip'
ENVIRONMENT: 'PRODUCTION'
FLASK_APP_PATH: '/opt/python/current/app'
PORT: 80
- step:
image: python:3.7.3
name: 'Deploy code to elasticbeanstalk worker'
caches:
- pip
script: # Modify the commands below to build your repository.
- pip install -U setuptools
- pip install -U wheel
- pip install -U flask-cors
- pip install mod_wsgi-httpd
- pip install -r requirements.txt
- rm -rf .platform
- pipe: atlassian/aws-elasticbeanstalk-deploy:0.5.5
variables:
AWS_ACCESS_KEY_ID: '$AWS_ACCESS_KEY_ID'
AWS_SECRET_ACCESS_KEY: '$AWS_SECRET_ACCESS_KEY'
AWS_DEFAULT_REGION: 'us-west-1'
APPLICATION_NAME: 'careerfair-cluster-service'
ENVIRONMENT_NAME: 'ClusterServiceRepeatWorker'
ZIP_FILE: 'careerfair-cluster-service.zip'
ENVIRONMENT: 'PRODUCTION'
VERSION_LABEL: 'cluster-service-$BITBUCKET_BUILD_NUMBER-repeater'
FLASK_APP_PATH: '/opt/python/current/app'
THREADED: 'False'
PORT: 80
wsgi.py
import sys
sys.stdout = sys.__stdout__
sys.stderr = sys.__stderr__
解决方案
推荐阅读
- hive - 我们可以通过元数据提取在 hive 上运行的查询吗
- visual-studio-code - VSCode 市场扩展:损坏的 ZIP:未找到中央目录记录签名的结尾
- elasticsearch - Kibana Visualize 未显示索引数据上可用的所有字段
- android - 无法更改 android 10 的权限,也无法运行 webview
- linux - 读取视频设备的问题
- android - SharedPreferences String + Int(保存和检索)
- c - 尽管套接字已准备好读取,但读取系统调用返回零字节
- angular - 添加代理配置后,Angular 外部 API 调用继续在poort 4200 上提供服务
- mysql - MySQL获取另一个属性的所有可能值都不存在的属性
- unity3d - 使用 iText7 构建 UWP Unity 时出现 NotSupportedException