sql - 修复 SQL Query 上的表连接导致查询未运行
问题描述
我正在 Teradata 处理旧版 SQL 查询,该查询用于制作将数据推送到下游数据分析应用程序的临时表。我是这项工作的新手,在快速查看表后,我注意到查询在连接条件中引用同一个表的错误。修复后,我重新运行查询,期待更好的结果。
该数据库拥有数十亿条记录。在修复查询之前,原始临时表包含大约 250 万行,运行大约需要 5 秒。修复查询后,查询将运行数小时而没有任何进展迹象,然后通常会使应用程序崩溃。
原始查询如下。我在我“修复”连接的地方发表了评论。
我只运行了 select 语句,sample 10
只是用来查看是否可以拉回任何数据,但同样,它需要永远运行。
关于为什么查询没有执行的任何想法?或者如何提高性能以查看它是否真的有效?甚至如何检查引擎盖下是否发生了什么?
create multiset volatile table order_date as
(
SELECT a.ord_nbr,
a.ord_spot_cnt AS ord_spot_cnt,
a.ord_totl_amt AS ord_totl_amt,
a.eclipse_regn_nm AS eclipse_regn_nm,
a.sale_offc_key AS sales_office,
so.sale_offc_nm AS sales_office_name,
a.sale_prsn_key AS sales_person,
sp.sale_prsn_nm AS sales_person_name,
trunc(a.ctrc_begn_dt, 'month') ctrc_mnth,
(cast(a.ctrc_end_dt AS date format 'yyyy/mm/dd') - cast(a.ctrc_begn_dt AS date format 'yyyy/mm/dd')) AS ctrc_duration,
CASE WHEN extract (month FROM a.ctrc_begn_dt) in (10,11) THEN 1
WHEN extract (month FROM a.ctrc_begn_dt) in (12) AND extract (day FROM a.ctrc_begn_dt) <= 22 THEN 1 ELSE 0 END AS first_look_flg,
CASE WHEN extract (month FROM a.ctrc_begn_dt) in (12) AND extract (day FROM a.ctrc_begn_dt) > 22 THEN 2
WHEN extract (month FROM a.ctrc_begn_dt) in (1) THEN 2
WHEN extract (month FROM a.ctrc_begn_dt) in (2) AND extract (day FROM a.ctrc_begn_dt) <= 14 THEN 1 ELSE 0 END AS second_look_flg
FROM PRD_AM_BI_1.am_order_eda_fact a
inner join prd_am_bi_1.AM_SALES_OFFICE_EDA_DIM so
on (a.SALE_OFFC_KEY = so.SALE_OFFC_KEY
and so.ECLIPSE_REGN_NM = so.ECLIPSE_REGN_NM)
-- I 'fixed' the above line to "and a.ECLIPSE_REGN_NM = so.ECLIPSE_REGN_NM)"
inner join prd_am_bi_1.AM_SALES_PERSON_EDA_DIM sp
on (a.SALE_PRSN_KEY = sp.SALE_PRSN_KEY
and a.ECLIPSE_REGN_NM = sp.ECLIPSE_REGN_NM)
)
with data
on commit preserve rows
;
编辑:
这是“固定查询”的 EXPLAIN 输出。两个查询的总估计时间相同,所以我不确定为什么“固定”查询实际上没有执行。
Explanation
------------------------------------------------------------------------
1) First, we lock PRD_AM_1.AM_ORDER_EDA_FACT in view
PRD_AM_BI_1.am_order_eda_fact in TD_MAP1 for access, we lock
PRD_AM_1.AM_SALES_PERSON_EDA_DIM in view
PRD_AM_BI_1.AM_SALES_PERSON_EDA_DIM in TD_MAP1 for access, and we
lock PRD_AM_1.AM_SALES_OFFICE_EDA_DIM in view
PRD_AM_BI_1.AM_SALES_OFFICE_EDA_DIM in TD_MAP1 for access.
2) Next, we execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step in TD_MAP1 from
PRD_AM_1.AM_SALES_OFFICE_EDA_DIM in view
PRD_AM_BI_1.AM_SALES_OFFICE_EDA_DIM by way of an all-rows
scan with no residual conditions into Spool 2 (all_amps),
which is redistributed by hash code to all AMPs to all AMPs
in TD_Map1. Then we do a SORT to order Spool 2 by row hash.
The size of Spool 2 is estimated with high confidence to be
426 rows (21,300 bytes). The estimated time for this step is
0.01 seconds.
2) We do an all-AMPs RETRIEVE step in TD_MAP1 from
PRD_AM_1.AM_SALES_PERSON_EDA_DIM in view
PRD_AM_BI_1.AM_SALES_PERSON_EDA_DIM by way of an all-rows
scan with no residual conditions into Spool 3 (all_amps),
which is redistributed by hash code to all AMPs to all AMPs
in TD_Map1. Then we do a SORT to order Spool 3 by row hash.
The size of Spool 3 is estimated with high confidence to be
229,155 rows (10,770,285 bytes). The estimated time for this
step is 0.04 seconds.
3) We do an all-AMPs RETRIEVE step in TD_MAP1 from
PRD_AM_1.AM_ORDER_EDA_FACT in view
PRD_AM_BI_1.am_order_eda_fact by way of an all-rows scan with
no residual conditions into Spool 4 (all_amps), which is
redistributed by hash code to all AMPs to all AMPs in TD_Map1.
Then we do a SORT to order Spool 4 by row hash. The size of
Spool 4 is estimated with high confidence to be 2,041,151
rows (136,757,117 bytes). The estimated time for this step
is 0.21 seconds.
3) We do an all-AMPs JOIN step in TD_Map1 from Spool 2 (Last Use) by
way of a RowHash match scan, which is joined to Spool 3 (Last Use)
by way of a RowHash match scan. Spool 2 and Spool 3 are joined
using a merge join, with a join condition of ("ECLIPSE_REGN_NM =
ECLIPSE_REGN_NM"). The result goes into Spool 5 (all_amps), which
is built locally on the AMPs. The size of Spool 5 is estimated
with no confidence to be 203,927 rows (17,129,868 bytes). The
estimated time for this step is 0.02 seconds.
4) We do an all-AMPs JOIN step in TD_Map1 from Spool 4 (Last Use) by
way of a RowHash match scan, which is joined to Spool 5 (Last Use)
by way of a RowHash match scan. Spool 4 and Spool 5 are joined
using a merge join, with a join condition of ("(ECLIPSE_REGN_NM =
ECLIPSE_REGN_NM) AND ((SALE_PRSN_KEY = SALE_PRSN_KEY) AND
((ECLIPSE_REGN_NM = ECLIPSE_REGN_NM) AND (SALE_OFFC_KEY =
SALE_OFFC_KEY )))"). The result goes into Spool 1 (group_amps),
which is built locally on the AMPs. The size of Spool 1 is
estimated with no confidence to be 2,041,151 rows (293,925,744
bytes). The estimated time for this step is 0.02 seconds.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.24 seconds.
解决方案
查看解释计划以查看查询在何处花费最多的成本。然后专注于通过使用不同的连接、过滤器、子查询或使用索引来降低成本。