首页 > 解决方案 > 雪花查询引擎策略上的几种带查询条件

问题描述

我正在做从 pyspark 查询到雪花查询的迁移工作,并想知道 A、B 以下选项之间哪个选项更好。

为避免不必要的查询,如果没有显着的性能差异,我想选择 B 选项。

在 B 选项中,雪花查询引擎是否自动优化并且内部行为类似于 A 选项?

一个选项

With A1 AS (select * from a1 where date='2021-10-20'),
A2 AS (select * from a2 where date='2021-10-20'),
A3 AS (select * from a3 where date='2021-10-20'),
A4 AS (select * from a4 where date='2021-10-20'),
A5 AS (select * from a5 where date='2021-10-20')
SELECT *
FROM final_merged_table

和 B 选项

With A1 AS (select * from a1),
A2 AS (select * from a2),
A3 AS (select * from a3),
A4 AS (select * from a4),
A5 AS (select * from a5)
SELECT *
FROM final_merged_table
WHERE date = '2021-10-20'

标签: sqlsnowflake-cloud-data-platformquery-engine

解决方案


我们可以对此进行测试。首先,让我们构建一个包含一周日期和几百万行的表:

create or replace table one_week2
as
select '2020-04-01'::date + (7*seq8()/100000000)::int day, random() data, random() data2, random() data3
from table(generator(rowcount => 100000000))

现在我们可以编写两个查询来遍历这个表:

选项1:

With A1 AS (select * from one_week2 where day='2020-04-05'),
A2 AS (select * from one_week2 where day='2020-04-05'),
A3 AS (select * from one_week2 where day='2020-04-05'),
A4 AS (select * from one_week2 where day='2020-04-05'),
A5 AS (select * from one_week2 where day='2020-04-05'),
final_merged_table as (
    select * from a1 
    union all select * from a2
    union all select * from a3
    union all select * from a4
    union all select * from a5)

SELECT count(*)
FROM final_merged_table

选项 2:

With A1 AS (select * from one_week2),
A2 AS (select * from one_week2),
A3 AS (select * from one_week2),
A4 AS (select * from one_week2),
A5 AS (select * from one_week2),
final_merged_table as (
    select * from a1 
    union all select * from a2
    union all select * from a3
    union all select * from a4
    union all select * from a5)

SELECT count(*)
FROM final_merged_table
where day='2020-04-05'
;

当我们运行这些查询时,两者的配置文件看起来相同 - 因为过滤器已被按下:

选项 1 配置文件

在此处输入图像描述

选项 2 配置文件

在此处输入图像描述

总之

您可以信任 Snowflake 优化器。

信任很重要,但也要验证:有时优化器可能会被复杂的 CTE 弄糊涂。有时,Snowflake engs 优化了优化器,今天不起作用的东西,明天可以更好地工作。

在此处输入图像描述


推荐阅读