首页 > 解决方案 > 使用“或”连接单个表中的两列太慢

问题描述

我正在尝试获取从上个月最后一天到当月最后一天的总交易计数和金额(时间是 '15:00:00')

表结构

CREATE TABLE mjr_agent
(
    id bigint NOT NULL UNIQUE,
    head_agent_id varchar(20),
    code varchar(20) NOT NULL UNIQUE,
    password varchar(200) NOT NULL,
    agent_type smallint NOT NULL,
    agent_group smallint NOT NULL,
    division varchar(100) NOT NULL,
    township varchar(200) NOT NULL,
    shop_name varchar(200),
    contact_name varchar(150) NOT NULL,
    ph_no varchar(50),
    contact_ph_no varchar(50),
    address varchar(500),
    record_reg_id bigint NOT NULL,
    record_upd_id bigint NOT NULL,
    record_reg_date timestamp DEFAULT current_timestamp NOT NULL,
    record_upd_date timestamp DEFAULT current_timestamp NOT NULL,
    user_id bigint,
    PRIMARY KEY (id)
) WITHOUT OIDS;

CREATE TABLE mjr_transaction_item
(
    id bigint NOT NULL UNIQUE,
    exl_id bigint NOT NULL,
    sr_no int NOT NULL,
    txn_id varchar(20) NOT NULL,
    txn_date timestamp NOT NULL,
    txn_status smallint NOT NULL,
    src_id varchar(20) NOT NULL,
    src_type smallint NOT NULL,
    src_name varchar(100),
    src_ocpt varchar(100),
    dest_id varchar(20) NOT NULL,
    dest_type smallint NOT NULL,
    dest_name varchar(100),
    desc_ocpt varchar(100),
    amount numeric,
    reason varchar(250),
    medium smallint NOT NULL,
    record_reg_id bigint NOT NULL,
    record_upd_id bigint NOT NULL,
    record_reg_date timestamp DEFAULT current_timestamp NOT NULL,
    record_upd_date timestamp DEFAULT current_timestamp NOT NULL,
    PRIMARY KEY (id)
) WITHOUT OIDS;

每个表中的总记录

mjr_agent = 700 和 mjr_transaction_item = 1136043

询问

SELECT
    agnt.ID AS agent_id,
    COUNT ( DISTINCT txi.ID ) AS total_transaction,
    SUM ( txi.amount ) AS total_amount
FROM
    mjr_agent agnt
    LEFT JOIN mjr_transaction_item txi ON agnt.code = txi.src_id 
    OR agnt.code = txi.dest_id
WHERE
    CONCAT (( date_trunc( 'month', to_timestamp( '2019-04-01', 'YYYY-MM-DD' )) + INTERVAL '1 month' - INTERVAL '1 day' ) :: DATE, ' 15:00:00' ) :: TIMESTAMP >= txi.txn_date 
    AND txi.txn_date >= CONCAT (( date_trunc( 'month', to_timestamp( '2019-04-01', 'YYYY-MM-DD' )) :: DATE - 1 ), ' 15:00:00' ) :: TIMESTAMP 
GROUP BY
    agnt.ID 

查询计划

GroupAggregate  (cost=1000.27..52530.69 rows=86 width=48) (actual time=779.221..14716.528 rows=400 loops=1)
  Group Key: agnt.id
  Buffers: shared hit=10729
  ->  Nested Loop  (cost=1000.27..52523.53 rows=812 width=48) (actual time=759.898..14708.292 rows=10132 loops=1)
        Join Filter: (((agnt.code)::text = (txi.src_id)::text) OR ((agnt.code)::text = (txi.dest_id)::text))
        Rows Removed by Join Filter: 65321568
        Buffers: shared hit=10729
        ->  Index Scan using mjr_agent_pkey on mjr_agent agnt  (cost=0.27..189.56 rows=86 width=66) (actual time=0.045..1.827 rows=700 loops=1)
              Buffers: shared hit=92
        ->  Materialize  (cost=1000.00..50911.10 rows=947 width=156) (actual time=0.001..5.658 rows=93331 loops=700)
              Buffers: shared hit=10637
              ->  Gather  (cost=1000.00..50906.37 rows=947 width=156) (actual time=0.809..616.836 rows=93331 loops=1)
                    Workers Planned: 4
                    Workers Launched: 4
                    Buffers: shared hit=10637
                    ->  Parallel Seq Scan on mjr_transaction_item txi  (cost=0.00..49811.67 rows=237 width=156) (actual time=0.150..625.545 rows=18666 loops=5)
                          Filter: ((txn_date >= (concat(((date_trunc('month'::text, to_timestamp('2019-04-01'::text, 'YYYY-MM-DD'::text)))::date - 1), ' 15:00:00'))::timestamp without time zone) AND ((concat((((date_trunc('month'::text, to_timestamp('2019-04-01'::text, 'YYYY-MM-DD'::text)) + '1 mon'::interval) - '1 day'::interval))::date, ' 15:00:00'))::timestamp without time zone >= txn_date))
                          Rows Removed by Filter: 208542
                          Buffers: shared hit=47351
Planning time: 0.582 ms
Execution time: 14718.609 ms

硬件规格

Cent OS 7.6,内存 32 GB,8 CPU,Postgres 10.0

平均查询执行时间约为12秒,这对我的应用程序来说是不可接受的。我在数据库调优方面没有太多经验,而且我也不太擅长编写 SQL。所以,我请求经验开发者的每一个建议。任何建议将不胜感激。

标签: sqlpostgresql

解决方案


带有 OR 条件的(左)JOIN 通常没有得到很好的优化。但是您可以做的是在两列的数组上创建一个索引,然后将连接条件更改为使用数组运算符

首先创建索引:

create index on mjr_transaction_item using gin ( (array[src_id, dest_id]) );

然后改变JOIN条件:

...
FROM mjr_agent agnt
  LEFT JOIN mjr_transaction_item txi ON array[agnt.code] <@ array[txi.src_id, txi.dest_id] 
WHERE ...

您也可以使用 ANY 条件,但我认为这不会使用索引:

...
FROM mjr_agent agnt
  LEFT JOIN mjr_transaction_item txi ON agnt.code = ANY(array[txi.src_id, txi.dest_id]) 
WHERE ...

推荐阅读