sql - SQL BigQuery 资源超出了对大表(1 亿行)进行 LEFT JOIN 的操作,我应该怎么做才能获得良好的性能/优化?
问题描述
我想从我的客户表(1 亿行)和我的活动表(2000 行)中检索信息。我一直将活动表用作LEFT 表,因为我想要那里的所有数据(我知道这COUNT(DISTINCT)
对良好的性能没有帮助,但我不知道 DISTINCT 之外的其他子句)。
我一直在使用这个查询:
SELECT
COUNT(DISTINCT
CASE
WHEN CLIENT.date <= CAMP.end_date AND CLIENT.date >= CAMP.start_date AND CLIENT.freq <= 1 THEN client_id
END
) AS new_client,
FROM CAMP
LEFT JOIN CLIENT
ON CAMP.country=CLIENT.country
行 | 新客户 |
---|---|
1 | 6543210 |
这个单一的查询花了我 23 分钟来检索数据。这只是我真正想要使用的查询的一部分(在下一个查询中我有错误“resourcesExceeded 400”)。我没有使用WHERE
子句,因为我需要所有的客户。看看他们是新客户还是老客户。
我真正想使用的查询是这个:
SELECT
COUNT(DISTINCT CASE WHEN CLIENT.date <= CAMP.end_date AND CLIENT.date >= CAMP.start_date AND CLIENT.freq <= 1 THEN client_id END) AS new_client,
COUNT(DISTINCT CASE WHEN CLIENT.date <= CAMP.end_date AND CLIENT.date >= CAMP.start_date AND CLIENT.freq > 1 THEN client_id END) AS old_client,
COUNT(DISTINCT CASE WHEN CLIENT.date <= CAMP.end_date AND CLIENT.date >= CAMP.start_date AND CLIENT.country IN ("USA","EUROPA","MEXICO","OCEANIA","ASIA") AND freq = 1 THEN client_id END ) + COUNT(DISTINCT CASE WHEN CLIENT.date <= CAMP.end_date AND CLIENT.date >= CAMP.start_date AND CLIENT.country NOT IN ("USA","EUROPA","MEXICO","OCEANIA","ASIA") AND year_freq >= 1 AND year_freq <= 3 THEN client_id END ) AS low_activity,
COUNT(DISTINCT CASE WHEN CLIENT.date <= CAMP.end_date AND CLIENT.date >= CAMP.start_date AND CLIENT.country IN ("USA","EUROPA","MEXICO","OCEANIA","ASIA") AND freq = 2 THEN client_id END ) + COUNT(DISTINCT CASE WHEN CLIENT.date <= CAMP.end_date AND CLIENT.date >= CAMP.start_date AND CLIENT.country NOT IN ("USA","EUROPA","MEXICO","OCEANIA","ASIA") AND year_freq > 3 AND year_freq <= 6 THEN client_id END ) AS medium_activity,
COUNT(DISTINCT CASE WHEN CLIENT.date <= CAMP.end_date AND CLIENT.date >= CAMP.start_date AND CLIENT.country IN ("USA","EUROPA","MEXICO","OCEANIA","ASIA") AND freq >= 3 THEN client_id END ) + COUNT(DISTINCT CASE WHEN CLIENT.date <= CAMP.end_date AND CLIENT.date >= CAMP.start_date AND CLIENT.country NOT IN ("USA","EUROPA","MEXICO","OCEANIA","ASIA") AND year_freq > 6 THEN client_id END ) AS high_activity,
FROM CAMP
LEFT JOIN CLIENT
ON CAMP.country = CLIENT.country
此查询运行 1 小时,然后因错误“超出资源”而中断。所以我问你是否有另一种方法来改进查询。
我想要的是在我做特别活动的那几周以及在老客户中的medium_activity客户端和high_activity客户端中,获得新老客户的份额。new_client 将是low_activity。
我 99% 确定这个查询会因为太多COUNT(DISTINCT)
. 作为 SQL 中的菜鸟,我不知道如何优化此查询。如果有更好的东西COUNT(DISTINCT)
知道会很酷。
客户端表未使用任何分区
任何改进此查询的性能/优化的建议、意见、帮助、新子句、新语句等都是非常受欢迎的。
解决方案
推荐阅读
- android - 如何解决问题 SQLiteException: no such table: Cars (code 1 SQLITE_ERROR): , while compile: SELECT DISTINCT * FROM Cars?
- python - Python 显示时间跨度
- asp.net-core-3.1 - .NET Core 3.1 with Identity - 如何从 SQL 数据库动态设置 CookieAuthenticationOptions
- spring-boot - Pivotal Web 服务:需要 Flyway 企业版或 MySQL 升级
- python - 在 Python 中安装“预览生成器”模块会导致 FileNotFoundError
- python - Web Scrape - Python - 如果字符串中的“文本”然后字符串 =“文本”问题 - 冲突的命令?
- azure - 尝试将 Azure 业务用于消费者时出错
- javascript - Rithim 使用 fetch 和 Promise 的异步 Javascript 练习
- c# - 如何在机器人框架 v4 中从 ActivityHandler.OnMessageActivityAsync 启动瀑布对话框
- excel - 刷新列表框并返回原始行