首页 > 解决方案 > 将列添加到 SELECT 后,PostgreSQL 左连接变得非常慢

问题描述

我习惯于使用物化视图,直到现在我还没有遇到任何性能问题。我正在使用 PostgreSQL v9.4.5。因此,当我LEFT JOIN使用表格对物化视图进行简单操作时,一切正常,直到选择一定数量的列(目前总共 56 列)。当我在语句中添加一个额外的列时SELECT,执行时间从 3 秒跳到 5 分钟以上,这很奇怪且不可接受。

我已经尝试过使用不同的列组合,它似乎真的取决于数量而不是类型或列中的数据。然后我使用该EXPLAIN语句来查看计划者在这两种情况下都在做什么,我发现

Hash Left Join  (cost=115615.56..501280.28 rows=2159465 width=400) (actual time=1696.312..5210.677 rows=2159465 loops=1)
  Hash Cond: (mv.fkey = tb.id)
  Buffers: shared hit=132206, temp read=45756 written=45754
  ->  Seq Scan on mv (cost=0.00..125290.65 rows=2159465 width=372) (actual time=0.005..288.453 rows=2159465 loops=1)
        Buffers: shared hit=103696
  ->  Hash  (cost=58202.25..58202.25 rows=2969225 width=28) (actual time=1693.613..1693.613 rows=2969225 loops=1)
        Buckets: 524288  Batches: 2  Memory Usage: 89300kB
        Buffers: shared hit=28510, temp written=8987
        ->  Seq Scan on tb  (cost=0.00..58202.25 rows=2969225 width=28) (actual time=0.010..585.523 rows=2969225 loops=1)
              Buffers: shared hit=28510
Planning time: 0.376 ms
Execution time: 5316.682 ms

变成

Hash Right Join  (cost=257726.96..505730.81 rows=2159465 width=408) (actual time=13991.355..810029.353 rows=2159465 loops=1)
  Hash Cond: (tb.id = mv.fkey)
  Buffers: shared hit=132206, temp read=368900 written=237830
  ->  Seq Scan on tb  (cost=0.00..58202.25 rows=2969225 width=36) (actual time=0.006..631.307 rows=2969225 loops=1)
        Buffers: shared hit=28510
  ->  Hash  (cost=125290.65..125290.65 rows=2159465 width=372) (actual time=9442.991..9442.991 rows=2159465 loops=1)
        Buckets: 65536  Batches: 131072 (originally 8)  Memory Usage: 204367kB
        Buffers: shared hit=103696, temp written=61097
        ->  Seq Scan on mv  (cost=0.00..125290.65 rows=2159465 width=372) (actual time=0.005..357.327 rows=2159465 loops=1)
              Buffers: shared hit=103696
Planning time: 0.329 ms
Execution time: 810230.707 ms

我不明白为什么会发生散列左连接和散列右连接之间的变化,也不明白它是否与问题相关。在 tb.id 和 mv.fkey 上创建索引可以提高性能,但仅持续大约 1 分钟。这是我非常简单的查询:

SELECT mv.*, tb.id, tb.ts, tb.f_score, tb.adress, tb.score, tb.decision, tb.rating 
-- , tb.rating_level -- that one changes everything
FROM mv 
LEFT JOIN tb on tb.id = mv.fkey

这是 DDL:

CREATE TABLE tb (
    id int4 NOT NULL,
    ts timestamp NULL,
    f_score int4 NULL,
    adress text NULL,
    decision text NULL,
    score int4 NULL,
    rating text NULL,
    rating_level text NULL,
    risk_indicator int4 NULL,
    CONSTRAINT tb_pk PRIMARY KEY (id)
)

CREATE TABLE mv (
    request_id int4 NOT NULL,
    customer_id text NULL,
    customer_category text NULL,
    id_document_type text NULL,
    nationality text NULL,
    gender text NULL,
    date_of_birth date NULL,
    active_postpaid_subscriptions_count int4 NULL,
    accumulated_customer_recurring_charges_value text NULL,
    agreement_start_date timestamp NULL,
    latest_agreement_start_date timestamp NULL,
    bill_payment_method text NULL,
    blacklisted_fraud text NULL,
    blacklisted_collection text NULL,
    created_at timestamp NULL,
    country text NULL,
    city text NULL,
    zip text NULL,
    order_id text NULL,
    order_type text NULL,
    includes_mnp int4 NULL,
    dealer_chain_code text NULL,
    dealer_chain_name text NULL,
    typed text NULL,
    order_channel text NULL,
    is_blacklisted int4 NULL,
    is_whitelisted int4 NULL,
    response_id int4 NULL,
    order_score int4 NULL,
    state text NULL,
    business_instruction_template_names text NULL,
    fkey int4 NULL,
    response_received int4 NULL,
    orderer_type text NULL,
    internal_score int4 NULL,
    external_final_score int4 NULL,
    final_score int4 NULL,
    payment_type text NULL,
    installment text NULL,
    simo text NULL,
    price_plan_market text NULL,
    product_name text NULL,
    one_time_fee numeric NULL,
    one_time_discount numeric NULL,
    recurring_fee float8 NULL,
    recurring_discount float8 NULL,
    down_payment_amount text NULL,
    allowed_postpaid_subscription_count int4 NULL,
    allowed_postpaid_with_prepayment_subscription_count int4 NULL,
    allowed_payment_methods text NULL,
    CONSTRAINT mv_pk PRIMARY KEY (request_id),
    CONSTRAINT mv_idx UNIQUE (fkey)
)

mv 的行数为 2,159,791 (810 MB),而 tb 的行数为 2,969,402 (222 MB)。由于它是LEFT JOIN唯一键,因此预期结果应该具有与 mv 本身相同数量的记录。

从两个表中选择所有列时,如何避免发生这种性能下降?

标签: sqlpostgresqlquery-optimizationdatabase-performance

解决方案


推荐阅读