首页 > 解决方案 > 如何在 AWS Elastic Beanstalk 上设置 Ray(分布式编程框架)和 Flask 应用程序?

问题描述

背景

我一直在尝试将 Ray ( https://github.com/ray-project/ray ) 实现到我们 AI API 的生产版本中,但没有成功。本质上,我们想用它来加速我们的一种聚类算法,以减少大量集群发生的延迟。我们的 API 是使用 Python3.6、Flask 和 Numpy 编写的。我们使用 Elastic Beanstalk 和 bitbucket 管道使持续开发相对容易。但是,当我们最近尝试合并 Ray 时,我们不断遇到一系列错误。有些是来自 EB 的构建错误,我们通过删除修复了这些错误enum34(我们以前从未做过的事情),其余的似乎是 mod_wsgi 错误(同样,我们以前从未遇到过的事情)。以下是 CloudWatch 上记录的错误消息片段(此后重复)。我只想知道我做错了什么,如何修复出现的这个错误,以及如何正确部署这个 API。

堆栈跟踪(来自 CloudWatch)

2020-09-15T02:27:57.571-07:00   [Tue Sep 15 09:26:06.609052 2020] [suexec:notice] [pid 3248] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)

2020-09-15T02:27:57.571-07:00   [Tue Sep 15 09:26:06.623605 2020] [http2:warn] [pid 3248] AH10034: The mpm module (prefork.c) is not supported by mod_http2. The mpm determines how things are processed in your server. HTTP/2 has more demands in this regard and the currently selected mpm will just not do. This is an advisory warning. Your server will continue to work, but the HTTP/2 protocol will be inactive.

2020-09-15T02:27:57.571-07:00   [Tue Sep 15 09:26:06.623616 2020] [http2:warn] [pid 3248] AH02951: mod_ssl does not seem to be enabled

2020-09-15T02:27:57.571-07:00   [Tue Sep 15 09:26:06.624028 2020] [lbmethod_heartbeat:notice] [pid 3248] AH02282: No slotmem from mod_heartmonitor

2020-09-15T02:27:57.571-07:00   [Tue Sep 15 09:26:06.624068 2020] [:warn] [pid 3248] mod_wsgi: Compiled for Python/3.6.2.

2020-09-15T02:27:57.571-07:00   [Tue Sep 15 09:26:06.624072 2020] [:warn] [pid 3248] mod_wsgi: Runtime using Python/3.6.12.

2020-09-15T02:27:57.571-07:00   [Tue Sep 15 09:26:06.625952 2020] [mpm_prefork:notice] [pid 3248] AH00163: Apache/2.4.46 (Amazon) mod_wsgi/3.5 Python/3.6.12 configured -- resuming normal operations

2020-09-15T02:27:57.571-07:00   [Tue Sep 15 09:26:06.625967 2020] [core:notice] [pid 3248] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'

2020-09-15T02:32:51.234-07:00   [Tue Sep 15 09:32:50.298983 2020] [mpm_prefork:notice] [pid 3248] AH00169: caught SIGTERM, shutting down

2020-09-15T02:32:52.234-07:00   [Tue Sep 15 09:32:51.353642 2020] [suexec:notice] [pid 7649] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)

2020-09-15T02:32:52.234-07:00   [Tue Sep 15 09:32:51.367142 2020] [so:warn] [pid 7649] AH01574: module wsgi_module is already loaded, skipping

2020-09-15T02:32:52.234-07:00   [Tue Sep 15 09:32:51.368899 2020] [http2:warn] [pid 7649] AH10034: The mpm module (prefork.c) is not supported by mod_http2. The mpm determines how things are processed in your server. HTTP/2 has more demands in this regard and the currently selected mpm will just not do. This is an advisory warning. Your server will continue to work, but the HTTP/2 protocol will be inactive.

2020-09-15T02:32:52.234-07:00   [Tue Sep 15 09:32:51.368908 2020] [http2:warn] [pid 7649] AH02951: mod_ssl does not seem to be enabled

2020-09-15T02:32:52.234-07:00   [Tue Sep 15 09:32:51.369374 2020] [lbmethod_heartbeat:notice] [pid 7649] AH02282: No slotmem from mod_heartmonitor

2020-09-15T02:32:52.234-07:00   [Tue Sep 15 09:32:51.369426 2020] [:warn] [pid 7649] mod_wsgi: Compiled for Python/3.6.2.

2020-09-15T02:32:52.234-07:00   [Tue Sep 15 09:32:51.369431 2020] [:warn] [pid 7649] mod_wsgi: Runtime using Python/3.6.12.

2020-09-15T02:32:52.234-07:00   [Tue Sep 15 09:32:51.377064 2020] [mpm_prefork:notice] [pid 7649] AH00163: Apache/2.4.46 (Amazon) mod_wsgi/3.5 Python/3.6.12 configured -- resuming normal operations

2020-09-15T02:32:52.234-07:00   [Tue Sep 15 09:32:51.377085 2020] [core:notice] [pid 7649] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'

2020-09-15T02:32:55.236-07:00   [Tue Sep 15 09:32:54.638019 2020] [:warn] [pid 7659] mod_wsgi (pid=7659): Callback registration for signal 15 ignored.

2020-09-15T02:32:55.236-07:00   [Tue Sep 15 09:32:54.638907 2020] [:warn] [pid 7659] File "/opt/python/current/app/run.py", line 5, in <module>

2020-09-15T02:32:55.236-07:00   [Tue Sep 15 09:32:54.638917 2020] [:warn] [pid 7659] import ray

2020-09-15T02:32:55.236-07:00   [Tue Sep 15 09:32:54.638925 2020] [:warn] [pid 7659] File "<frozen importlib._bootstrap>", line 971, in _find_and_load

2020-09-15T02:32:55.236-07:00   [Tue Sep 15 09:32:54.638931 2020] [:warn] [pid 7659] File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked

2020-09-15T02:32:55.236-07:00   [Tue Sep 15 09:32:54.638937 2020] [:warn] [pid 7659] File "<frozen importlib._bootstrap>", line 665, in _load_unlocked

2020-09-15T02:32:55.236-07:00   [Tue Sep 15 09:32:54.638942 2020] [:warn] [pid 7659] File "<frozen importlib._bootstrap_external>", line 678, in exec_module

2020-09-15T02:32:55.236-07:00   [Tue Sep 15 09:32:54.638948 2020] [:warn] [pid 7659] File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed

2020-09-15T02:32:55.236-07:00   [Tue Sep 15 09:32:54.638954 2020] [:warn] [pid 7659] File "/opt/python/run/venv/local/lib/python3.6/site-packages/ray/__init__.py", line 81, in <module>

2020-09-15T02:32:55.236-07:00   [Tue Sep 15 09:32:54.638957 2020] [:warn] [pid 7659] from ray.worker import (

2020-09-15T02:32:55.236-07:00   [Tue Sep 15 09:32:54.638962 2020] [:warn] [pid 7659] File "<frozen importlib._bootstrap>", line 971, in _find_and_load

2020-09-15T02:32:55.236-07:00   [Tue Sep 15 09:32:54.638967 2020] [:warn] [pid 7659] File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked

2020-09-15T02:32:55.236-07:00   [Tue Sep 15 09:32:54.638973 2020] [:warn] [pid 7659] File "<frozen importlib._bootstrap>", line 665, in _load_unlocked

2020-09-15T02:32:55.236-07:00   [Tue Sep 15 09:32:54.638987 2020] [:warn] [pid 7659] File "<frozen importlib._bootstrap_external>", line 678, in exec_module

2020-09-15T02:32:55.236-07:00   [Tue Sep 15 09:32:54.638992 2020] [:warn] [pid 7659] File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed

2020-09-15T02:32:55.236-07:00   [Tue Sep 15 09:32:54.638998 2020] [:warn] [pid 7659] File "/opt/python/run/venv/local/lib/python3.6/site-packages/ray/worker.py", line 873, in <module>

2020-09-15T02:32:55.236-07:00   [Tue Sep 15 09:32:54.639001 2020] [:warn] [pid 7659] ray.utils.set_sigterm_handler(sigterm_handler)

2020-09-15T02:32:55.236-07:00   [Tue Sep 15 09:32:54.639006 2020] [:warn] [pid 7659] File "/opt/python/run/venv/local/lib/python3.6/site-packages/ray/utils.py", line 722, in set_sigterm_handler

2020-09-15T02:32:55.236-07:00   [Tue Sep 15 09:32:54.639010 2020] [:warn] [pid 7659] signal.signal(signal.SIGTERM, sigterm_handler)

2020-09-15T02:32:55.236-07:00   [Tue Sep 15 09:32:55.203215 2020] [:error] [pid 7659] DEBUG:ray.node:Process STDOUT and STDERR is being redirected to /tmp/ray/session_2020-09-15_09-32-55_202688_7659/logs.

2020-09-15T02:32:55.236-07:00   [Tue Sep 15 09:32:55.204027 2020] [:error] [pid 7659] INFO:ray.resource_spec:Starting Ray with 4.44 GiB memory available for workers and up to 2.24 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).

2020-09-15T02:32:56.238-07:00   [Tue Sep 15 09:32:55.309054 2020] [:error] [pid 7659] DEBUG:ray.services:Waiting for redis server at 127.0.0.1:6379 to respond...

2020-09-15T02:32:56.238-07:00   [Tue Sep 15 09:32:55.525174 2020] [:error] [pid 7659] DEBUG:ray.services:Waiting for redis server at 127.0.0.1:19594 to respond...

2020-09-15T02:32:56.238-07:00   [Tue Sep 15 09:32:55.526660 2020] [:error] [pid 7659] DEBUG:ray.services:Starting Redis shard with 0.8 GB max memory.

2020-09-15T02:32:56.238-07:00   [Tue Sep 15 09:32:55.663605 2020] [:error] [pid 7659] INFO:ray.services:View the Ray dashboard at \x1b[1m\x1b[32mlocalhost:8265\x1b[39m\x1b[22m

2020-09-15T02:32:56.238-07:00   [Tue Sep 15 09:32:55.665571 2020] [:error] [pid 7659] DEBUG:ray.node:Process STDOUT and STDERR is being redirected to /tmp/ray/session_2020-09-15_09-32-55_202688_7659/logs.

2020-09-15T02:32:56.238-07:00   [Tue Sep 15 09:32:55.666148 2020] [:error] [pid 7659] DEBUG:ray.services:Determine to start the Plasma object store with 2.41 GB memory using /dev/shm.

2020-09-15T02:32:56.238-07:00   [Tue Sep 15 09:32:55.670650 2020] [:error] [pid 7659] DEBUG:ray.services:Determine to start the Plasma object store with 2.41 GB memory using /dev/shm.

2020-09-15T02:32:56.238-07:00   [Tue Sep 15 09:32:55.696353 2020] [:error] [pid 7659] [remote 127.0.0.1:0] mod_wsgi (pid=7659): Target WSGI script '/opt/python/current/app/run.py' cannot be loaded as Python module.

2020-09-15T02:32:56.238-07:00   [Tue Sep 15 09:32:55.696417 2020] [:error] [pid 7659] [remote 127.0.0.1:0] mod_wsgi (pid=7659): Exception occurred processing WSGI script '/opt/python/current/app/run.py'.

2020-09-15T02:32:56.238-07:00   [Tue Sep 15 09:32:55.696594 2020] [:error] [pid 7659] [remote 127.0.0.1:0] Traceback (most recent call last):

2020-09-15T02:32:56.238-07:00   [Tue Sep 15 09:32:55.696630 2020] [:error] [pid 7659] [remote 127.0.0.1:0] File "/opt/python/current/app/run.py", line 24, in <module>

2020-09-15T02:32:56.238-07:00   [Tue Sep 15 09:32:55.696635 2020] [:error] [pid 7659] [remote 127.0.0.1:0] ray.init(configure_logging=False)

2020-09-15T02:32:56.238-07:00   [Tue Sep 15 09:32:55.696643 2020] [:error] [pid 7659] [remote 127.0.0.1:0] File "/opt/python/run/venv/local/lib/python3.6/site-packages/ray/worker.py", line 806, in init

2020-09-15T02:32:56.238-07:00   [Tue Sep 15 09:32:55.696648 2020] [:error] [pid 7659] [remote 127.0.0.1:0] job_id=job_id)

2020-09-15T02:32:56.238-07:00   [Tue Sep 15 09:32:55.696654 2020] [:error] [pid 7659] [remote 127.0.0.1:0] File "/opt/python/run/venv/local/lib/python3.6/site-packages/ray/worker.py", line 1178, in connect

2020-09-15T02:32:56.238-07:00   [Tue Sep 15 09:32:55.696659 2020] [:error] [pid 7659] [remote 127.0.0.1:0] faulthandler.enable(all_threads=False)

2020-09-15T02:32:56.238-07:00   [Tue Sep 15 09:32:55.696677 2020] [:error] [pid 7659] [remote 127.0.0.1:0] AttributeError: 'mod_wsgi.Log' object has no attribute 'fileno

这是我们在 .ebextensions 中的代码:

.ebextensions/packages.config 文件:

packages:
  yum:
    gcc-c++: []
    python36-devel: []

这里的实际错误似乎是AttributeError: 'mod_wsgi.Log' object has no attribute 'fileno'.

.ebextensions/00_commands.config

commands:
  00_setup_pip:
    command: sudo python3 -m pip install --upgrade --force pip
  01_uninstall_enum34:
    command: pip uninstall -y enum34

.ebextensions/00_files.config

files:
  "/opt/elasticbeanstalk/hooks/appdeploy/pre/00_uninstall_enum34.sh":
    mode: "000755"
    owner: root
    group: root
    content: |
      rm -f -r /opt/python/run/venv/lib/python3.6/site-packages/enum && rm -f -r /opt/python/run/venv/lib/python3.6/site-packages/enum34-1.1.10.dist-info

运行.py

对于 Flask 应用程序,我们的运行文件编写如下:

#!/usr/bin/env python3
import atexit
import importlib
import logging
import ray
import threading

import settings
import src.routes as routes
from src import app as application


# Commented out because we can't even get the single machine instance working. Forget clusters.
# if settings.ENVIRONMENT == 'PRODUCTION':
#     try:
#         ray.init(address='auto')
#     except Exception:
#         ray.init()
# else:
#     try:
#         ray.init()
#     except Exception as error:
#         raise error

ray.init(configure_logging=False)

# Import and register routes with Flask application
# importlib.import_module('.routes', 'src')
if settings.REPEAT:
    log = logging.getLogger('werkzeug')
    log.setLevel(logging.ERROR)


if __name__ == '__main__':
    # If this service is designated as the repeat service this function will be called every hour.
    application.run('0.0.0.0', port=settings.PORT, debug=True, threaded=settings.THREADED)

我们过去也必须container_commands启动 Ray 集群,但后来我们删除了它,以便解决我们遇到的单实例 Ray 问题。

bitbucket-pipeline.yaml

image: python:3.7.3

pipelines:
  branches:
    master:
      - step:
          image: atlassian/default-image:2
          name: 'Build and Test'
          script:
            - zip -r careerfair-cluster-service.zip . -x '*.git*'
          # Define an artifact to pass the zip file to the next step
          artifacts:
            - careerfair-cluster-service.zip
      - step:
          image: python:3.7.3
          name: 'Deploy code to elasticbeanstalk'
          caches:
            - pip
          script: # Modify the commands below to build your repository.
            - pip install -U setuptools
            - pip install -U wheel
            - pip install -U flask-cors
            - pip install mod_wsgi-httpd
            - pip install -r requirements.txt
            - pipe: atlassian/aws-elasticbeanstalk-deploy:0.5.5
              variables:
                AWS_ACCESS_KEY_ID: '$AWS_ACCESS_KEY_ID'
                AWS_SECRET_ACCESS_KEY: '$AWS_SECRET_ACCESS_KEY'
                AWS_DEFAULT_REGION: 'us-west-1'
                APPLICATION_NAME: 'careerfair-cluster-service'
                ENVIRONMENT_NAME: 'ClusterServiceProduction'
                ZIP_FILE: 'careerfair-cluster-service.zip'
                ENVIRONMENT: 'PRODUCTION'
                FLASK_APP_PATH: '/opt/python/current/app'
                PORT: 80
      - step:
          image: python:3.7.3
          name: 'Deploy code to elasticbeanstalk worker'
          caches:
            - pip
          script: # Modify the commands below to build your repository.
            - pip install -U setuptools
            - pip install -U wheel
            - pip install -U flask-cors
            - pip install mod_wsgi-httpd
            - pip install -r requirements.txt
            - rm -rf .platform
            - pipe: atlassian/aws-elasticbeanstalk-deploy:0.5.5
              variables:
                AWS_ACCESS_KEY_ID: '$AWS_ACCESS_KEY_ID'
                AWS_SECRET_ACCESS_KEY: '$AWS_SECRET_ACCESS_KEY'
                AWS_DEFAULT_REGION: 'us-west-1'
                APPLICATION_NAME: 'careerfair-cluster-service'
                ENVIRONMENT_NAME: 'ClusterServiceRepeatWorker'
                ZIP_FILE: 'careerfair-cluster-service.zip'
                ENVIRONMENT: 'PRODUCTION'
                VERSION_LABEL: 'cluster-service-$BITBUCKET_BUILD_NUMBER-repeater'
                FLASK_APP_PATH: '/opt/python/current/app'
                THREADED: 'False'
                PORT: 80

wsgi.py

import sys


sys.stdout = sys.__stdout__
sys.stderr = sys.__stderr__

标签: python-3.xamazon-web-servicesflaskamazon-elastic-beanstalkray

解决方案


推荐阅读