首页 > 解决方案 > sqlachemy 执行查询比正常运行查询慢

问题描述

我在 db Postgres 12.3 中使用 flask 1.1.2、sqlachemy 1.3.18 和 flask-sqlalchemy 2.4.4 进行查询,但是当通过查询生成器使用来自 db 的查询数据时,我遇到了一个问题,它比执行慢 10 -20 倍这个查询(从生成器生成)直接通过 pgadmin 或 anyelse 工具数据库,尽管这个相同的数据库相同的查询。这是我的代码:

base_model.py

from flask_sqlalchemy import SQLAlchemy

db = SQLAlchemy()

class Base(db.Model):
  __abstract__ = True

  def __init__(self, data):
    for key, item in data.items():
        setattr(self, key, item)

  @classmethod
  def get_all_per_column(cls, column, order=None):
    column = getattr(cls, column, None)

    query = cls.query.with_entities(cls.__mapper__.primary_key[0], column).distinct()
    if order:
        query = query.order_by(order)
    else:
        query = query.order_by(column)
    res = query.all()
    return res

a_model.py

class A(Base):
   __tablename__ = 'table_a'

   @classmethod
   def query_join_and_filter_many_tables_in_db(cls, filters=None):
       query = cls.query.with_entities(<list column in table a>, <list column in table b>, <list column table c>, ...)
       query = query.join(table_b)
       query = query.join(table_c)
       .... # Join about ~10 table is here
       query = query.filter(<>)
       list_result = query.offset(0).limit(20).all()

       # Get full query
       # full_query = query.statement.compile(compile_kwargs={"literal_binds": True})
       return list_result

主文件

from flask import Flask
from sqlalchemy import create_engine
from sqlalchemy import event
from sqlalchemy.engine import Engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import scoped_session, sessionmaker
from base_mode import db

Base = declarative_base()
DBSession = scoped_session(sessionmaker())
engine = create_engine(os.getenv('DB_URI'), echo_pool=True)
DBSession.remove()
DBSession.configure(bind=engine, autoflush=True, expire_on_commit=False)
Base.metadata.drop_all(engine)
Base.metadata.create_all(engine)
db.init_app(app)

#### use check time excute ####
@event.listens_for(Engine, "before_cursor_execute")
def before_cursor_execute(conn, cursor, statement,
                          parameters, context, executemany):
    conn.info.setdefault('query_start_time', []).append(time.time())
    logger.debug("Start Query: %s", statement)


@event.listens_for(Engine, "after_cursor_execute")
def after_cursor_execute(conn, cursor, statement,
                         parameters, context, executemany):
    total = time.time() - conn.info['query_start_time'].pop(-1)
    logger.debug("Query Complete!")
    logger.debug("Total Time: %f", total)

这是我的日志:

2020-11-26 14:27:25 - p13 - (myapp.sqltime:84) - DEBUG: Start Query: <QUERY #if bring it to db tools this query run very fast> 
2020-11-26 14:27:25 - p13 - (myapp.sqltime:91) - DEBUG: Query Complete!
2020-11-26 14:27:25 - p13 - (myapp.sqltime:92) - DEBUG: Total Time: 143.593255 #in second

但是在 db 工具上运行相同的查询只需要 2 秒。

当我使用sqlalchemy-profiled运行时。我有以下日志:

运行完整的查询生成器日志

345197 function calls (333631 primitive calls) in 144.396 seconds

   Ordered by: cumulative time
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    0.000    0.000  143.366   71.683 /usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py:3498(__iter__)
        2    0.000    0.000  143.338   71.669 /usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py:3523(_execute_and_instances)
        5    0.000    0.000  143.273   28.655 /usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py:946(execute)
        2    0.000    0.000  143.267   71.633 /usr/local/lib/python3.8/site-packages/sqlalchemy/sql/elements.py:296(_execute_on_connection)
        2    0.000    0.000  143.267   71.633 /usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py:1082(_execute_clauseelement)
        5    0.000    0.000  143.183   28.637 /usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py:1189(_execute_context)
        9  143.167   15.907  143.167   15.907 {method 'execute' of 'psycopg2.extensions.cursor' objects}
        7    0.000    0.000  143.162   20.452 /usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py:592(do_execute)
        1    0.000    0.000  142.788  142.788 /usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py:3325(all)
        1    0.000    0.000    0.818    0.818 /usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py:3715(count)
      112    0.001    0.000    0.628    0.006 /usr/local/lib/python3.8/site-packages/sqlalchemy/orm/base.py:222(generate)
        1    0.000    0.000    0.583    0.583 /usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py:3471(scalar)
        1    0.000    0.000    0.583    0.583 /usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py:3439(one)
        1    0.000    0.000    0.583    0.583 /usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py:3403(one_or_none)
4739/4092    0.016    0.000    0.409    0.000 /usr/local/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py:880(__get__)
       61    0.001    0.000    0.372    0.006 /usr/local/lib/python3.8/site-packages/sqlalchemy/orm/query.py:2087(join)

使用原始查询运行:

    16935 function calls (15669 primitive calls) in 153.674 seconds
       Ordered by: cumulative time
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.000    0.000  153.620  153.620 /usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py:2217(execute)
            1    0.000    0.000  153.603  153.603 /usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py:946(execute)
            1    0.000    0.000  153.603  153.603 /usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py:1164(_execute_text)
            1    0.000    0.000  153.603  153.603 /usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py:1189(_execute_context)
            1    0.000    0.000  153.592  153.592 /usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py:592(do_execute)
            1  153.592  153.592  153.592  153.592 {method 'execute' of 'psycopg2.extensions.cursor' objects}
            1    0.000    0.000    0.052    0.052 <string>:1(<lambda>)
            1    0.000    0.000    0.052    0.052 /usr/local/lib/python3.8/site-packages/sqlalchemy/sql/elements.py:412(compile)
            1    0.000    0.000    0.052    0.052 /usr/local/lib/python3.8/site-packages/sqlalchemy/sql/elements.py:478(_compiler)
            1    0.000    0.000    0.052    0.052 /usr/local/lib/python3.8/site-packages/sqlalchemy/sql/compiler.py:527(__init__)
            1    0.000    0.000    0.052    0.052 /usr/local/lib/python3.8/site-packages/sqlalchemy/sql/compiler.py:274(__init__)
          6/1    0.000    0.000    0.052    0.052 /usr/local/lib/python3.8/site-packages/sqlalchemy/sql/compiler.py:349(process)
        823/1    0.004    0.000    0.052    0.052 /usr/local/lib/python3.8/site-packages/sqlalchemy/sql/visitors.py:86(_compiler_dispatch)
         14/1    0.000    0.000    0.052    0.052 /usr/local/lib/python3.8/site-packages/sqlalchemy/sql/compiler.py:2045(visit_select)
           14    0.001    0.000    0.035    0.002 /usr/local/lib/python3.8/site-packages/sqlalchemy/sql/compiler.py:2127(<listcomp>)
          159    0.002    0.000    0.034    0.000 /usr/local/lib/python3.8/site-packages/sqlalchemy/sql/compiler.py:1799(_label_select_column)
         14/1    0.000    0.000    0.020    0.020 /usr/local/lib/python3.8/site-packages/sqlalchemy/sql/compiler.py:2233(_compose_select_body)
         14/1    0.000    0.000    0.019    0.019 /usr/local/lib/python3.8/site-packages/sqlalchemy/sql/compiler.py:2252(<listcomp>)

执行查询需要 153 秒。(虽然直接用工具运行只需要2s)

感谢帮助!

PS:还有一件事,当我用旧数据库中的完整数据克隆新数据库时,然后将我的代码指向新数据库,这段代码可以作为直接查询快速工作。我重新启动旧数据库然后再次指向代码 => 仍然像以前一样缓慢运行(该死)(生气)。

更新: 当我通过select * from pg_stats_activity获取所有这些查询通过代码执行检查数据库时,状态处于活动状态(headbang)

UPDATE 2 here解释了在2个相同的数据库中分析相同的查询(克隆) 慢速查询 快速查询

任何人都可以为我解释这一点。以及如何修复它(无需克隆 newdb)。

最终更新:我已经通过 2 个步骤优化了这个查询,并且我将执行时间减少了约 20 倍。

标签: pythonpostgresqlsqlalchemyflask-sqlalchemypsycopg2

解决方案


看起来这些混淆查询并不相同。我在快速查询中找不到任何过滤器,看起来像这样(来自慢速查询):

加入过滤器:(juliet_quebec(yankee_uniform((quebec_foxtrot.four_three)::text), 'foxtrot_whiskey'::text, ''whiskey_two'([\hotel_quebec]+)$'juliet_foxtrot''::text)))

这看起来像一个正则表达式。你能深入研究一下吗?并将其与原始的慢速和快速查询进行比较?

india_three 也很耗时,每次 seq 扫描都很快,但它是在许多循环中完成的(198602,两次)。

有什么方法可以向我们展示真实的(混淆的)查询?


推荐阅读