sql - Google Big Query 中的扩展资源
问题描述
我正在尝试在 bigquery 脚本上使用 ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...) 并继续遇到“超出资源”错误。
该表的大小为 219.96 GB,行数为 1,611,220,127
这是脚本:
With cte as (
SELECT
Source,
ROW_NUMBER() OVER (PARTITION BY FirstName, LastName, MiddleName, Address, Address2, City, State, Zip ORDER BY Attom_ID DESC) as rnk
,FirstName, LastName, MiddleName, Gender, Age, DOB, Address, Address2, City, State, Zip, Zip4, TimeZone, Income, HomeValue, Networth, MaritalStatus, IsRenter, HasChildren, CreditRating, Investor, LinesOfCredit, InvestorRealEstate, Traveler, Pets, MailResponder, Charitable, PolicalDonations, PoliticalParty, Attom_ID, GEOID, Score, Score1, Score2, Score3, Score4, Score5, Latitude, Longitude
from `db.ds.tblA`
) select * from cte where rnk = 1
虽然这是一张表,但它是连接的结果,其中 PRIOR 到 ATTOM_ID 的所有列都来自一个表,而 ATTOM_ID、GEOID 等来自第二个表。我相信结果集中有一些笛卡尔。
表中有多种重复项,我正在尝试对表进行重复数据删除。我害怕尝试使用 MAX(ATTOM_ID) 进行 GROUP_BY,因为我想确保使用正确关联的 GEOID 和 SCORES。我不想把它混在一起。
问题是这个特定的查询超出了资源,所以我想知道我这里是否有任何选项。谢谢!
解决方案
以下相当于您的原始查询(结果方面),通常解决“超出资源”问题
#standardSQL
SELECT r.* FROM (
SELECT
ARRAY_AGG(STRUCT(Source,FirstName, LastName, MiddleName, Gender, Age, DOB, Address, Address2, City, State, Zip, Zip4, TimeZone, Income, HomeValue, Networth, MaritalStatus, IsRenter, HasChildren, CreditRating, Investor, LinesOfCredit, InvestorRealEstate, Traveler, Pets, MailResponder, Charitable, PolicalDonations, PoliticalParty, Attom_ID, GEOID, Score, Score1, Score2, Score3, Score4, Score5, Latitude, Longitude) ORDER BY Attom_ID DESC LIMIT 1)[OFFSET(0)]
FROM `db.ds.tblA`
GROUP BY FirstName, LastName, MiddleName, Address, Address2, City, State, Zip
)
推荐阅读
- r - 如何在 heatmap.2 中进行颜色滚动
- terraform - 使用 Terraform 配置 Auth0 扩展
- sql - 如何在 sqlcmd 中动态命名 sql 文件的输出
- vagrant - Install Vagrant 2.2.6 on OSX 10.15.2 Catalina
- c - Multiple shared library constructors not being called
- sql - Fetch data from multiple child tables
- django - Django django.db.utils.OperationalError: FATAL: 剩余的连接槽是为非复制超级用户连接保留的
- java - 从 java 使用 REST API 的不同方式
- azure-devops - AzureDev Ops CI 版本未更新版本 (VersionPrefix)
- javascript - Compare an Array to an Array of Objects and Return a new Array of Objects with Grouped Dates