首页 > 解决方案 > 在 wsgi 中托管应用程序时,Tensorflow sess.run() 不会在路由函数中执行

问题描述

我有以下app.py文件:

import json

from flask import Flask
import numpy as np
from dssm.model_dense_ngram import *

app = Flask(__name__)

sess = tf.compat.v1.Session()
init = tf.compat.v1.global_variables_initializer()
sess.run(init)
print("making representation")
representation, = sess.run([y], feed_dict={x: np.zeros((1, NO_OF_TRIGRAMS))})
print("Sum of representation: {}".format(np.sum(representation)))

def get_representation():
    print("Making representation")
    representation, = sess.run([y], feed_dict={x: np.zeros((1, NO_OF_TRIGRAMS))})
    print("Made representation")
    return np.sum(representation)


# We call the API like: localhost:5000/neuralSearch/
@app.route("/neuralSearch")
def get_neural_search():
    return json.dumps({
        "result": get_representation(),
    }, indent=4)

我将它托管在带有 nginx 和 wsgi 的 docker 容器中。这是 Dockerfile:

FROM maven:3.6.3-jdk-11

RUN apt-get clean \
    && apt-get -y update
RUN apt-get -y install python3.7

RUN apt-get -y install nginx \
    && apt-get -y install python3-dev \
    && apt-get -y install build-essential
RUN apt-get -y install python3-setuptools

RUN apt -y install python3-pip
RUN pip3 install --upgrade pip
RUN pip3 install wheel
RUN apt-get -y install libpcre3 libpcre3-dev
RUN pip3 install uwsgi

RUN mkdir -p /srv/flask_app

COPY dssm /srv/flask_app/dssm
COPY uwsgi.ini /srv/flask_app
COPY requirements.txt /srv/flask_app
COPY start.sh /srv/flask_app
COPY wsgi.py /srv/flask_app
COPY app.py /srv/flask_app/app.py
WORKDIR /srv/flask_app

RUN pip install -r requirements.txt --src /usr/local/src

RUN rm /etc/nginx/sites-enabled/default
RUN rm -r /root/.cache

COPY nginx.conf /etc/nginx/
RUN chmod +x ./start.sh

ENV FLASK_APP app.py
ENV NEURALSEARCH_TRIGRAMS_PATH /srv/preprocessed_datasets/trigrams.txt
ENV CONFLUENCE_INDICES_FILE /srv/preprocessed_datasets/confluence/data.csv
ENV CONFLUENCE_TEXT_FILE /srv/preprocessed_datasets/confluence/mid.json

EXPOSE 80

ENTRYPOINT ["./start.sh"]

我使用 构建docker build . -t flask_image并运行容器docker run --name flask_container -p 80:80 flask_image。当我运行容器时,我得到以下输出:

Starting nginx: nginx.
[uWSGI] getting INI configuration from uwsgi.ini
*** Starting uWSGI 2.0.17.1 (64bit) on [Wed Jul 29 22:33:23 2020] ***
compiled with version: 8.3.0 on 29 July 2020 22:31:17
os: Linux-4.19.76-linuxkit #1 SMP Tue May 26 11:42:35 UTC 2020
nodename: 2a28f8711a05
machine: x86_64
clock source: unix
pcre jit disabled
detected number of CPU cores: 4
current working directory: /srv/flask_app
detected binary path: /usr/local/bin/uwsgi
setgid() to 33
setuid() to 33
your memory page size is 4096 bytes
detected max file descriptor number: 1048576
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uwsgi socket 0 bound to UNIX address /tmp/uwsgi.socket fd 3
Python version: 3.7.3 (default, Dec 20 2019, 18:57:59)  [GCC 8.3.0]
*** Python threads support is disabled. You can enable it with --enable-threads ***
Python main interpreter initialized at 0x558ebb239230
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 437424 bytes (427 KB) for 5 cores
*** Operational MODE: preforking ***
2020-07-29 22:33:24.500245: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-07-29 22:33:24.500335: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2020-07-29 22:33:26.270527: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-07-29 22:33:26.270706: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2020-07-29 22:33:26.270745: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (2a28f8711a05): /proc/driver/nvidia/version does not exist
2020-07-29 22:33:26.271023: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-07-29 22:33:26.279470: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2492630000 Hz
2020-07-29 22:33:26.280063: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x558ebe1eb190 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-29 22:33:26.280148: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
making representation
Sum of representation: -0.7329927682876587
WSGI app 0 (mountpoint='') ready in 3 seconds on interpreter 0x558ebb239230 pid: 23 (default app)
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI master process (pid: 23)
spawned uWSGI worker 1 (pid: 47, cores: 1)
spawned uWSGI worker 2 (pid: 48, cores: 1)
spawned uWSGI worker 3 (pid: 49, cores: 1)
spawned uWSGI worker 4 (pid: 50, cores: 1)
spawned uWSGI worker 5 (pid: 51, cores: 1)

正如这一行所证明的, app.py 中Sum of representation: -0.7329927682876587的第一个调用成功执行。sess.run()

但是,如果我调用端点/neuralSearch,整个程序的执行就会sess.run()在函数中停止get_representation()。我得到以下输出:

Making representation

仅此而已,服务器没有返回响应,它只是冻结在那里。为什么会这样?我该如何解决?

可能需要的其他文件:

nginx.conf配置 nginx 服务器:

user www-data;
worker_processes auto;
pid /run/nginx.pid;

events {
    worker_connections 1024;
    use epoll;
    multi_accept on;
}

http {
    access_log /dev/stdout;
    error_log /dev/stdout;

    sendfile            on;
    tcp_nopush          on;
    tcp_nodelay         on;
    keepalive_timeout   65;
    types_hash_max_size 2048;

    include             /etc/nginx/mime.types;
    default_type        application/octet-stream;

    index   index.html index.htm;

    server {
        listen       80 default_server;
        listen       [::]:80 default_server;
        server_name  localhost;
        root         /var/www/html;

        location / {
            include uwsgi_params;
            uwsgi_pass unix:/tmp/uwsgi.socket;
            uwsgi_read_timeout 1h;
            uwsgi_send_timeout 1h;
            proxy_read_timeout 1h;
            proxy_send_timeout 1h;
        }
    }
}

wsgi.py

from app import app

uwsgi.ini

[uwsgi]
module = wsgi:app
uid = www-data
gid = www-data
master = true
processes = 5

socket = /tmp/uwsgi.socket
chmod-sock = 664
vacuum = true

die-on-term = true

开始.sh

#!/usr/bin/env bash
service nginx start
uwsgi --ini uwsgi.ini

编辑: 输出docker container top <container name>

$ docker container top 2a28f8711a05
PID                 USER                TIME                COMMAND
65246               root                0:00                bash ./start.sh
65292               root                0:00                nginx: master process /usr/sbin/nginx
65293               xfs                 0:00                nginx: worker process
65294               xfs                 0:00                nginx: worker process
65295               xfs                 0:00                nginx: worker process
65296               xfs                 0:00                nginx: worker process
65297               xfs                 0:03                uwsgi --ini uwsgi.ini
65321               xfs                 0:00                uwsgi --ini uwsgi.ini
65322               xfs                 0:00                uwsgi --ini uwsgi.ini
65323               xfs                 0:00                uwsgi --ini uwsgi.ini
65324               xfs                 0:00                uwsgi --ini uwsgi.ini
65325               xfs                 0:00                uwsgi --ini uwsgi.ini

标签: dockertensorflownginxflaskwsgi

解决方案


推荐阅读