首页 > 解决方案 > Oracle 查询性能:VIEW PUSHED PREDICATE 导致内部查询多次执行

问题描述

这与我们在查询涉及表和视图(具有 70+ 百万条记录)的连接时面临的性能问题有关。

在对不同环境中的执行计划进行广泛分析之后,我可以将其指向来自其中一个连接的 VIEW PUSHED PREDICATE 分支。

执行次数(执行计划上的开始列)等于驱动/外部表上返回的行数 - 可能是它正在评估外部结果集上的每个匹配项的视图。

由于这里涉及的表有数百万条记录,%CPU 和总体执行时间变得非常糟糕。如果我添加不推送谓词(no_push_pred)的提示,情况并非如此;处决只有1。

这是 VIEW PUSHED PREDICATE 所期望的,还是我错过了任何相关的概念?

Oracle 数据库版本:12c 企业版 12.1.0.2.0


我尝试使用简单的查询来模拟问题(或行为) - 请参阅下面的详细信息。

注意:这里添加了 no_merge 提示,以确保优化器在连接过程中不会合并视图,因此该计划与我实际查询的计划相同。

询问:

SELECT 
v.STATUS_CODE,
    a1.STATUS_DESC
    FROM STATUS_DETAIL a1,
    (select   /*+ no_merge push_pred */
        a2.STATUS_CODE
          from STATUS a2
             where  a2.STATUS_CODE < 50) v
  where a1.STATUS_CODE = v.STATUS_CODE;

执行计划(使用 TABLE(DBMS_XPLAN.display_cursor) 提取):

我指的是计划中的第 3 行和第 4 行 - Starts 列的值为 70(等于第 2 行的 A-Rows 列的值 - 驱动表访问)

 -----------------------------------------------------------------------------------------------------------------------------------                                                                                                                                                                         
| Id  | Operation               | Name          | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers |                                                                                                                                                                         
-----------------------------------------------------------------------------------------------------------------------------------                                                                                                                                                                         
|   0 | SELECT STATEMENT        |               |      1 |        |       |   213 (100)|          |     22 |00:00:00.01 |     350 |                                                                                                                                                                         
|   1 |  NESTED LOOPS           |               |      1 |     13 |   533 |   213   (0)| 00:00:01 |     22 |00:00:00.01 |     350 |                                                                                                                                                                         
|   2 |   TABLE ACCESS FULL     | STATUS_DETAIL |      1 |     70 |  1960 |     3   (0)| 00:00:01 |     70 |00:00:00.01 |       7 |                                                                                                                                                                         
|   3 |   VIEW PUSHED PREDICATE |               |     70 |      1 |    13 |     3   (0)| 00:00:01 |     22 |00:00:00.01 |     343 |                                                                                                                                                                         
|*  4 |    FILTER               |               |     70 |        |       |            |          |     22 |00:00:00.01 |     343 |                                                                                                                                                                         
|*  5 |     TABLE ACCESS FULL   | STATUS        |     49 |      1 |     4 |     3   (0)| 00:00:01 |     22 |00:00:00.01 |     343 |                                                                                                                                                                         
----------------------------------------------------------------------------------------------------------------------------------- 



Predicate Information (identified by operation id):                                                                                                                                                                                                                                                         
---------------------------------------------------                                                                                                                                                                                                                                                         

   4 - filter("A1"."STATUS_CODE"<50)                                                                                                                                                                                                                                                                        
   5 - filter(("A2"."STATUS_CODE"="A1"."STATUS_CODE" AND "A2"."STATUS_CODE"<50))

标签: oracleperformancejoinviewpredicate

解决方案


You are correct, the VIEW PUSHED PREDICATE operation means "The view thus becomes correlated and must be evaluated for each row of the outer query block".

This is a specific kind of predicate pushing, the join predicate pushdown transformation. The idea behind that transformation is that the view may need to be executed more frequently, but adding the join predicate to the view can make it run much faster, since the tables in that view can now use an index access.

There's nothing inherently wrong with the join predicate pushdown. Similar to a Cartesian product, executing a view multiple times is not necessarily bad. Sometimes a large number of fast things is better than a small number of slow things.

So why is Oracle making a bad choice here? Hard to say without a lot more data. Oracle is making some determination, using an equation roughly like this:

large number * small amount of time
<
small number * large amount of time

With a little more detail:

rows returned by outer query * time for index-accessed view
<
1 (for a hash join) * read smaller table, create hash function, then read the other table and probe it for matches, potentially writing and reading to temporary tablespace

Like with most query tuning, check the cardinalities.

Maybe Oracle significantly under-estimates the "large number" and thinks the rows returned by by the outer-table is much smaller than it really is. This can happen for a lot of reasons, like bad statistics, using lots of confusing functions the optimizer cannot estimate, using correlated columns with a relationship Oracle doesn't understand (unless you create multi-column histograms), etc.

Maybe Oracle significantly under-estimates the "small amount of time". It may think the index-access paths for the view are much faster than they really are. It could be for one of the reasons above, or it could be because somebody has messed with some critical parameters. It is unfortunately all too common for people to think, "indexes are fast, I should tell Oracle to use them more often by changing parameter defaults". Run this query and ensure the values are 0 and 100, the defaults.

select * from v$parameter where name in ('optimizer_index_cost_adj', 'optimizer_index_caching');

In practice, optimizer problems are almost always caused by Oracle under-estimating something, not over-estimating. So I'd focus on the left-hand side of the equation. Oracle is always trying to throw out as many rows as possible, and will always look for a way to get the cardinality down to 1. If there really is a path to get only one row out of a table, that can save a lot of work. But if there is only one way that looks like a quick path to one row, but isn't, it can mess up the whole plan.

If this is a ginormous query with lots of details, it's not unreasonable to give up on trying to find the root cause, and simply use a hint or an extra rownum psuedocolumn to force Oracle to stop transforming things.

Oracle provides a large number of data structures and algorithms to access the data. This gives the optimizer a lot of ways to find a faster way to run a query. But it also gives it more opportunities to make a mistake. No risk no reward, but there's no need to gamble on every query. If you have an outer query and an inner query that work fine separately, but don't work well together, by all means separate them and don't let Oracle try to combine them in a weird way.


Below is a quick sample, using tables similar to the ones in your query. It shows Oracle incorrectly using a VIEW PUSHED PREDICATE operation.

First, create some small tables, insert data, and gather stats. Everything looks fine so far.

drop table status_detail;
drop table status;

create table status_detail(status_code number, status_desc varchar2(100));
insert into status_detail select level, level from dual connect by level <= 10;

create table status(status_code number);
create index status_idx on status(status_code);

insert into status select level from dual connect by level <= 100000;
begin
    dbms_stats.gather_table_stats(user, 'status_detail');
    dbms_stats.gather_table_stats(user, 'status');
end;
/

Here comes the mistake. Pretend that someone loaded 100,000 rows in the STATUS_DETAIL table but forgot to re-gather stats. Oracle thinks the outer table only has 10 rows but it really has 100,000.

insert into status_detail select 1, level from dual connect by level <= 100000;
commit;

alter system flush shared_pool;

Run a query between STATUS_DETAIL and an inline view using STATUS. To prevent view merging, I added a join and a distinct to the query to make it hard to integrate A1 and V.

explain plan for
select count(*)
from status_detail a1,
(
    select distinct a2.status_code
    from status a2
    join status a3
        on a2.status_code=a3.status_code
) v
where a1.status_code = v.status_code;

Below is the bad execution plan. Oracle thinks STATUS_DETAIL only returns 10 rows, because of bad optimizer stats. That STATUS table is large, and joining it to itself is going to be expensive. Instead of joining a large table, Oracle can use a join predicate pushdown. By passing the STATUS_CODE predicate into the view, now it can use a simple INDEX RANGE SCAN operation on the large STATUS table. 10 small index range scans sounds faster than hash joining two large tables.

select * from table(dbms_xplan.display);

Plan hash value: 3172146404

------------------------------------------------------------------------------------------
| Id  | Operation                | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |               |     1 |     5 |    23   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE          |               |     1 |     5 |            |          |
|   2 |   NESTED LOOPS SEMI      |               |    10 |    50 |    23   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL     | STATUS_DETAIL |    10 |    30 |     3   (0)| 00:00:01 |
|   4 |    VIEW PUSHED PREDICATE |               |     1 |     2 |     2   (0)| 00:00:01 |
|   5 |     NESTED LOOPS SEMI    |               |     1 |    10 |     2   (0)| 00:00:01 |
|*  6 |      INDEX RANGE SCAN    | STATUS_IDX    |     1 |     5 |     1   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN    | STATUS_IDX    |     1 |     5 |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - access("A2"."STATUS_CODE"="A1"."STATUS_CODE")
   7 - access("A3"."STATUS_CODE"="A1"."STATUS_CODE")
       filter("A2"."STATUS_CODE"="A3"."STATUS_CODE")

If we gather the statistics and tell Oracle the real size of the STATUS table, things look much different. 100,000 index scans is a slow way to access every row in the table. Instead, the new plan hash joins the STATUS table together, and then hash joins the results with STATUS_DETAIL. The run time on my PC decreases from 0.5 seconds to 0.1 seconds.

begin
    dbms_stats.gather_table_stats(user, 'status_detail');
    dbms_stats.gather_table_stats(user, 'status');
end;
/

explain plan for
select count(*)
from status_detail a1,
(
    select distinct a2.status_code
    from status a2
    join status a3
        on a2.status_code=a3.status_code
) v
where a1.status_code = v.status_code;

select * from table(dbms_xplan.display);

Plan hash value: 3579559806

---------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |               |     1 |       |       |   556   (2)| 00:00:01 |
|   1 |  SORT AGGREGATE           |               |     1 |       |       |            |          |
|   2 |   VIEW                    | VM_NWVW_1     |    10 |       |       |   556   (2)| 00:00:01 |
|   3 |    HASH UNIQUE            |               |    10 |   190 |       |   556   (2)| 00:00:01 |
|*  4 |     HASH JOIN             |               |   100K|  1855K|  2056K|   552   (2)| 00:00:01 |
|   5 |      TABLE ACCESS FULL    | STATUS_DETAIL |   100K|   878K|       |    69   (2)| 00:00:01 |
|*  6 |      HASH JOIN SEMI       |               |   100K|   976K|  1664K|   277   (2)| 00:00:01 |
|   7 |       INDEX FAST FULL SCAN| STATUS_IDX    |   100K|   488K|       |    57   (2)| 00:00:01 |
|   8 |       INDEX FAST FULL SCAN| STATUS_IDX    |   100K|   488K|       |    57   (2)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("A1"."STATUS_CODE"="A2"."STATUS_CODE")
   6 - access("A2"."STATUS_CODE"="A3"."STATUS_CODE")

推荐阅读