sql - 雪花查询引擎策略上的几种带查询条件
问题描述
我正在做从 pyspark 查询到雪花查询的迁移工作,并想知道 A、B 以下选项之间哪个选项更好。
为避免不必要的查询,如果没有显着的性能差异,我想选择 B 选项。
在 B 选项中,雪花查询引擎是否自动优化并且内部行为类似于 A 选项?
一个选项
With A1 AS (select * from a1 where date='2021-10-20'),
A2 AS (select * from a2 where date='2021-10-20'),
A3 AS (select * from a3 where date='2021-10-20'),
A4 AS (select * from a4 where date='2021-10-20'),
A5 AS (select * from a5 where date='2021-10-20')
SELECT *
FROM final_merged_table
和 B 选项
With A1 AS (select * from a1),
A2 AS (select * from a2),
A3 AS (select * from a3),
A4 AS (select * from a4),
A5 AS (select * from a5)
SELECT *
FROM final_merged_table
WHERE date = '2021-10-20'
解决方案
我们可以对此进行测试。首先,让我们构建一个包含一周日期和几百万行的表:
create or replace table one_week2
as
select '2020-04-01'::date + (7*seq8()/100000000)::int day, random() data, random() data2, random() data3
from table(generator(rowcount => 100000000))
现在我们可以编写两个查询来遍历这个表:
选项1:
With A1 AS (select * from one_week2 where day='2020-04-05'),
A2 AS (select * from one_week2 where day='2020-04-05'),
A3 AS (select * from one_week2 where day='2020-04-05'),
A4 AS (select * from one_week2 where day='2020-04-05'),
A5 AS (select * from one_week2 where day='2020-04-05'),
final_merged_table as (
select * from a1
union all select * from a2
union all select * from a3
union all select * from a4
union all select * from a5)
SELECT count(*)
FROM final_merged_table
选项 2:
With A1 AS (select * from one_week2),
A2 AS (select * from one_week2),
A3 AS (select * from one_week2),
A4 AS (select * from one_week2),
A5 AS (select * from one_week2),
final_merged_table as (
select * from a1
union all select * from a2
union all select * from a3
union all select * from a4
union all select * from a5)
SELECT count(*)
FROM final_merged_table
where day='2020-04-05'
;
当我们运行这些查询时,两者的配置文件看起来相同 - 因为过滤器已被按下:
选项 1 配置文件
选项 2 配置文件
总之
您可以信任 Snowflake 优化器。
信任很重要,但也要验证:有时优化器可能会被复杂的 CTE 弄糊涂。有时,Snowflake engs 优化了优化器,今天不起作用的东西,明天可以更好地工作。
推荐阅读
- matplotlib - 显示总体趋势的许多样本的密度图 - 如何?
- python - 如何向我的 Python 端口扫描程序添加提示,要求用户输入他们想要扫描的端口?
- python - 如何将字符串与字典键进行比较
- python - Python透明KivyMD导航抽屉
- routes - Nuxt 子域路由
- python - 在 Python 中添加列表与扩展列表
- javascript - 如何访问函数对象中的函数?
- azure - 如何在azure数据流中处理excel
- python - python http请求(requests.get)已停止工作
- ios - 重新加载部分后无法取消选择 tableView 行