sql - 将列添加到 SELECT 后,PostgreSQL 左连接变得非常慢
问题描述
我习惯于使用物化视图,直到现在我还没有遇到任何性能问题。我正在使用 PostgreSQL v9.4.5。因此,当我LEFT JOIN
使用表格对物化视图进行简单操作时,一切正常,直到选择一定数量的列(目前总共 56 列)。当我在语句中添加一个额外的列时SELECT
,执行时间从 3 秒跳到 5 分钟以上,这很奇怪且不可接受。
我已经尝试过使用不同的列组合,它似乎真的取决于数量而不是类型或列中的数据。然后我使用该EXPLAIN
语句来查看计划者在这两种情况下都在做什么,我发现
Hash Left Join (cost=115615.56..501280.28 rows=2159465 width=400) (actual time=1696.312..5210.677 rows=2159465 loops=1)
Hash Cond: (mv.fkey = tb.id)
Buffers: shared hit=132206, temp read=45756 written=45754
-> Seq Scan on mv (cost=0.00..125290.65 rows=2159465 width=372) (actual time=0.005..288.453 rows=2159465 loops=1)
Buffers: shared hit=103696
-> Hash (cost=58202.25..58202.25 rows=2969225 width=28) (actual time=1693.613..1693.613 rows=2969225 loops=1)
Buckets: 524288 Batches: 2 Memory Usage: 89300kB
Buffers: shared hit=28510, temp written=8987
-> Seq Scan on tb (cost=0.00..58202.25 rows=2969225 width=28) (actual time=0.010..585.523 rows=2969225 loops=1)
Buffers: shared hit=28510
Planning time: 0.376 ms
Execution time: 5316.682 ms
变成
Hash Right Join (cost=257726.96..505730.81 rows=2159465 width=408) (actual time=13991.355..810029.353 rows=2159465 loops=1)
Hash Cond: (tb.id = mv.fkey)
Buffers: shared hit=132206, temp read=368900 written=237830
-> Seq Scan on tb (cost=0.00..58202.25 rows=2969225 width=36) (actual time=0.006..631.307 rows=2969225 loops=1)
Buffers: shared hit=28510
-> Hash (cost=125290.65..125290.65 rows=2159465 width=372) (actual time=9442.991..9442.991 rows=2159465 loops=1)
Buckets: 65536 Batches: 131072 (originally 8) Memory Usage: 204367kB
Buffers: shared hit=103696, temp written=61097
-> Seq Scan on mv (cost=0.00..125290.65 rows=2159465 width=372) (actual time=0.005..357.327 rows=2159465 loops=1)
Buffers: shared hit=103696
Planning time: 0.329 ms
Execution time: 810230.707 ms
我不明白为什么会发生散列左连接和散列右连接之间的变化,也不明白它是否与问题相关。在 tb.id 和 mv.fkey 上创建索引可以提高性能,但仅持续大约 1 分钟。这是我非常简单的查询:
SELECT mv.*, tb.id, tb.ts, tb.f_score, tb.adress, tb.score, tb.decision, tb.rating
-- , tb.rating_level -- that one changes everything
FROM mv
LEFT JOIN tb on tb.id = mv.fkey
这是 DDL:
CREATE TABLE tb (
id int4 NOT NULL,
ts timestamp NULL,
f_score int4 NULL,
adress text NULL,
decision text NULL,
score int4 NULL,
rating text NULL,
rating_level text NULL,
risk_indicator int4 NULL,
CONSTRAINT tb_pk PRIMARY KEY (id)
)
CREATE TABLE mv (
request_id int4 NOT NULL,
customer_id text NULL,
customer_category text NULL,
id_document_type text NULL,
nationality text NULL,
gender text NULL,
date_of_birth date NULL,
active_postpaid_subscriptions_count int4 NULL,
accumulated_customer_recurring_charges_value text NULL,
agreement_start_date timestamp NULL,
latest_agreement_start_date timestamp NULL,
bill_payment_method text NULL,
blacklisted_fraud text NULL,
blacklisted_collection text NULL,
created_at timestamp NULL,
country text NULL,
city text NULL,
zip text NULL,
order_id text NULL,
order_type text NULL,
includes_mnp int4 NULL,
dealer_chain_code text NULL,
dealer_chain_name text NULL,
typed text NULL,
order_channel text NULL,
is_blacklisted int4 NULL,
is_whitelisted int4 NULL,
response_id int4 NULL,
order_score int4 NULL,
state text NULL,
business_instruction_template_names text NULL,
fkey int4 NULL,
response_received int4 NULL,
orderer_type text NULL,
internal_score int4 NULL,
external_final_score int4 NULL,
final_score int4 NULL,
payment_type text NULL,
installment text NULL,
simo text NULL,
price_plan_market text NULL,
product_name text NULL,
one_time_fee numeric NULL,
one_time_discount numeric NULL,
recurring_fee float8 NULL,
recurring_discount float8 NULL,
down_payment_amount text NULL,
allowed_postpaid_subscription_count int4 NULL,
allowed_postpaid_with_prepayment_subscription_count int4 NULL,
allowed_payment_methods text NULL,
CONSTRAINT mv_pk PRIMARY KEY (request_id),
CONSTRAINT mv_idx UNIQUE (fkey)
)
mv 的行数为 2,159,791 (810 MB),而 tb 的行数为 2,969,402 (222 MB)。由于它是LEFT JOIN
唯一键,因此预期结果应该具有与 mv 本身相同数量的记录。
从两个表中选择所有列时,如何避免发生这种性能下降?
解决方案
推荐阅读
- macos - 如何使用曲线为 NSView 设置动画
- sql - 使用 LAG() 和 PARTITION BY 在日期的 10 天内返回行 IF
- swift - 如何在带有 RxSwift 的驱动程序上使用 flatMapLatest
- javascript - 如何从 Chrome 的 console.log() 输出中理解 JS 对象结构?
- python-3.x - CIFAR10 示例:Keras
- python - 来自 tf.keras.preprocessing.image.ImageDataGenerator.flow_from_directory 的 tf.data.Dataset?
- c# - c#如何使用Linq显示图表Y值标签
- python-3.x - 如果一个字段相关为False,我如何循环计算字段以使其计算方法发生变化?
- github - 声纳不报告 github 拉取请求的状态
- docker - 在 Docker 的仪表板上更改 Directus 7 API url