首页 > 解决方案 > 使用 group by 优化复杂的 sql 查询

问题描述

我遇到了查询太慢(大约 2-3 分钟!)的问题。

完整的查询是

SELECT Count(DISTINCT activitytr0_.libraryid) AS col_0_0_
FROM   mma_activitytrackerlibrary activitytr0_
       inner join mma_activitytracker activities1_
               ON activitytr0_.libraryid = activities1_.libraryid
WHERE  ( activitytr0_.channel NOT IN ( 'Classica', 'VOD Contents', 'FOX' ) )
       AND ( activities1_.phase IN ( 'Quality control', 'Spot check' ) )
       AND ( activities1_.result NOT IN ( 'Fail', 'TBC' ) )
       AND ( activitytr0_.tx_date IS NULL
              OR activitytr0_.tx_date >= DATE '2020-07-13' )
       AND ( activitytr0_.ident_tx IN (SELECT channelsch3_.clipid
                                       FROM   mma_channelscheduleevents channelsch3_
                                       GROUP  BY channelsch3_.clipid
                                       HAVING Min(channelsch3_.eventstarttime) >= DATE '2020-07-13')
              OR ( activitytr0_.tx_date IS NULL
                   AND activitytr0_.ident_tx NOT IN (SELECT channelsch4_.clipid
                                                     FROM   mma_channelscheduleevents channelsch4_) ) )
       AND activitytr0_.trash <> 'Y' 

“切割”查询,最慢的部分是:

SELECT channelsch3_.clipid
FROM   mma_channelscheduleevents channelsch3_
GROUP  BY channelsch3_.clipid
HAVING Min(channelsch3_.eventstarttime) >= DATE '2020-07-13')

这个查询大约需要 21 秒,在一个有 ~46000000 行的表上,在clipideventstarttime字段上有索引,并返回大约 2000 行

请记住,这是一个子查询,它是field not in (...)子句的一部分

我不知道如何优化查询,请记住clipid可以重复,我只想要“过去”中clipid甚至没有关联的 s eventstarttime

所以,对于这个简单的例子

+--------+----------------+
| clipid | eventstarttime |
+--------+----------------+
| A      | 2020-01-01     |
| A      | 2021-01-01     |
| B      | 2020-01-01     |
| C      | 2021-01-01     |
+--------+----------------+

我只想要一排,与C

--- 应萨扬·马拉克希诺夫的要求

select * from user_tab_col_statistics where table_name='MMA_CHANNELSCHEDULEEVENTS'

返回

TABLE_NAME               |COLUMN_NAME            |NUM_DISTINCT|LOW_VALUE  |HIGH_VALUE |DENSITY|NUM_NULLS|NUM_BUCKETS|LAST_ANALYZED      |SAMPLE_SIZE|GLOBAL_STATS|USER_STATS|AVG_COL_LEN|HISTOGRAM|
-------------------------|-----------------------|------------|-----------|-----------|-------|---------|-----------|-------------------|-----------|------------|----------|-----------|---------|
MMA_CHANNELSCHEDULEEVENTS|EVENTID                |           1|ÅT  ]4     |ÅT  ]4     |      1|        0|          1|2020-07-13 22:00:08|          1|YES         |NO        |          7|NONE     |
MMA_CHANNELSCHEDULEEVENTS|SCHEDULEID             |           1|Â Z        |Â Z        |      1|        0|          1|2020-07-13 22:00:08|          1|YES         |NO        |          4|NONE     |
MMA_CHANNELSCHEDULEEVENTS|EVENTTYPE              |           1|           |           |      1|        0|          1|2020-07-13 22:00:08|          1|YES         |NO        |          2|NONE     |
MMA_CHANNELSCHEDULEEVENTS|EVENTSTARTTIME         |           1|xx         |xx         |      1|        0|          1|2020-07-13 22:00:08|          1|YES         |NO        |          8|NONE     |
MMA_CHANNELSCHEDULEEVENTS|EVENTENDTIME           |           1|xx         |xx         |      1|        0|          1|2020-07-13 22:00:08|          1|YES         |NO        |          8|NONE     |
MMA_CHANNELSCHEDULEEVENTS|EVENTACTUALSTARTFRAMES |           1|Ä =        |Ä =        |      1|        0|          1|2020-07-13 22:00:08|          1|YES         |NO        |          4|NONE     |
MMA_CHANNELSCHEDULEEVENTS|EVENTSECEVENTOFFSET    |           0|           |           |      0|        1|          0|2020-07-13 22:00:08|           |YES         |NO        |          0|NONE     |
MMA_CHANNELSCHEDULEEVENTS|CLIPID                 |           1|THL50883   |THL50883   |      1|        0|          1|2020-07-13 22:00:08|          1|YES         |NO        |          9|NONE     |
MMA_CHANNELSCHEDULEEVENTS|SEGMENTNUMBER          |           0|           |           |      0|        1|          0|2020-07-13 22:00:08|           |YES         |NO        |          0|NONE     |
MMA_CHANNELSCHEDULEEVENTS|SCHEDULEDTITLE         |           0|           |           |      0|        1|          0|2020-07-13 22:00:08|           |YES         |NO        |          0|NONE     |
MMA_CHANNELSCHEDULEEVENTS|SCHEDULEDSOM           |           0|           |           |      0|        1|          0|2020-07-13 22:00:08|           |YES         |NO        |          0|NONE     |
MMA_CHANNELSCHEDULEEVENTS|SCHEDULEDDURATION      |           1|00:00:00:00|00:00:00:00|      1|        0|          1|2020-07-13 22:00:08|          1|YES         |NO        |         12|NONE     |
MMA_CHANNELSCHEDULEEVENTS|SCHEDULEDDURATIONFRAMES|           0|           |           |      0|        1|          0|2020-07-13 22:00:08|           |YES         |NO        |          0|NONE     |
MMA_CHANNELSCHEDULEEVENTS|EVENTCONTROLCODES      |           0|           |           |      0|        1|          0|2020-07-13 22:00:08|           |YES         |NO        |          0|NONE     |
MMA_CHANNELSCHEDULEEVENTS|RECONCILEKEY           |           0|           |           |      0|        1|          0|2020-07-13 22:00:08|           |YES         |NO        |          0|NONE     |
MMA_CHANNELSCHEDULEEVENTS|SCHEDULEDSOMBCD        |           0|           |           |      0|        1|          0|2020-07-13 22:00:08|           |YES         |NO        |          0|NONE     |
MMA_CHANNELSCHEDULEEVENTS|SCHEDULEDDURATIONBCD   |           0|           |           |      0|        1|          0|2020-07-13 22:00:08|           |YES         |NO        |          0|NONE     |

标签: sqloracle

解决方案


我们无法优化您的原始查询,因为您没有显示它并且我们不知道带有统计信息的真实执行计划,但我可以建议您一种方法来改进您向我们展示的那些部分:

  1. 您需要在 (eventstarttime, clipid) 上创建索引
  2. 你这样的查询:
SELECT DISTINCT c.clipid
FROM mma_channelscheduleevents c
WHERE c.eventstarttime>=date'2020-07-13'
and   not exists (select 1
                  from mma_channelscheduleevents c2
                  where c2.clipid = c.clipid and c2.eventstarttime < date'2020-07-13'
                 );

eventstarttime>=date'2020-07-13'如果行数远少于行数,它应该会快得多eventstarttime<date'2020-07-13'


推荐阅读