database - Greenplum查询成本与查询分析不匹配
问题描述
greenplum 查询的真正成本在哪里?我有一个简单的 sql 查询,在 SQL Server 上返回 80 毫秒,在 greenplum 上返回 1,500 毫秒!!!
“解释分析”显示“总执行时间:55ms”,但事实并非如此,我尝试使用 odbc dirver 或 psql 客户端连接到(greenplum)数据库,查询成本稳定在 1,500ms。
每个表上的数据不超过 10K,分区且无倾斜(无索引)。所有节点(master/segment)CPU/IO/Mem/metwork 平均低于 10%。
问题:
- 真正的成本在哪里
- 我在哪里可以找到它
- 我错过了一些关键点吗
SQL:
SELECT CoreHR_EmploymentRecord.ChangeTypeOID AS CoreHR_EmploymentRecord_ChangeTypeOID,
MAX(CoreHR_EmploymentRecord.ChangeTypeAlias) AS CoreHR_EmploymentRecord_ChangeTypeAlias,
COUNT(DISTINCT CoreHR_EmployeeInformation.UserID) AS CoreHR_EmployeeInformation_Count_UserID
FROM CoreHR_EmployeeInformation WITH (NOLOCK)
LEFT JOIN CoreHR_EmploymentRecord WITH (NOLOCK)
ON CoreHR_EmployeeInformation.TenantId = CoreHR_EmploymentRecord.TenantId
AND CoreHR_EmployeeInformation.UserID = CoreHR_EmploymentRecord.UserID
AND CoreHR_EmploymentRecord.TenantId = 106996
WHERE CoreHR_EmployeeInformation.TenantId = 106996
AND (CoreHR_EmploymentRecord.IsCurrentRecord = 1)
AND (CoreHR_EmploymentRecord.StartDate
BETWEEN '2018-05-24 00:00:00' AND '2018-06-22 23:59:59.998'
)
AND CoreHR_EmployeeInformation.ApprovalStatus IN
(
SELECT * FROM dbo.f_SplitToVarchar(4, ',')
)
GROUP BY CoreHR_EmploymentRecord.ChangeTypeOID
ORDER BY CoreHR_EmploymentRecord.ChangeTypeOID ASC;
执行计划(无索引):
"Gather Motion 3:1 (slice4; segments: 3) (cost=0.00..865.47 rows=18 width=18)"
" Merge Key: record.changetypeoid"
" Rows out: 8 rows at destination with 13 ms to end."
" -> GroupAggregate (cost=0.00..865.47 rows=6 width=18)"
" Group By: record.changetypeoid"
" Rows out: Avg 2.7 rows x 3 workers. Max 4 rows (seg0) with 11 ms to end."
" Executor memory: 8K bytes avg, 8K bytes max (seg0)."
" -> Sort (cost=0.00..865.47 rows=6 width=18)"
" Sort Key: record.changetypeoid"
" Sort Method: quicksort Max Memory: 33KB Avg Memory: 33KB (3 segments)"
" Rows out: Avg 6.7 rows x 3 workers. Max 12 rows (seg0) with 11 ms to end."
" Executor memory: 58K bytes avg, 58K bytes max (seg0)."
" Work_mem used: 58K bytes avg, 58K bytes max (seg0). Workfile: (0 spilling)"
" -> Redistribute Motion 3:3 (slice3; segments: 3) (cost=0.00..865.47 rows=6 width=18)"
" Hash Key: record.changetypeoid"
" Rows out: Avg 6.7 rows x 3 workers at destination. Max 12 rows (seg0) with 10 ms to end."
" -> Result (cost=0.00..865.47 rows=6 width=18)"
" Rows out: Avg 6.7 rows x 3 workers. Max 7 rows (seg1) with 7.992 ms to first row, 8.138 ms to end."
" -> GroupAggregate (cost=0.00..865.47 rows=6 width=18)"
" Group By: record.changetypeoid"
" Rows out: Avg 6.7 rows x 3 workers. Max 7 rows (seg1) with 7.990 ms to first row, 8.131 ms to end."
" Executor memory: 248K bytes avg, 258K bytes max (seg1)."
" Work_mem used: 58K bytes avg, 58K bytes max (seg0)."
" -> Sort (cost=0.00..865.47 rows=41 width=18)"
" Sort Key: record.changetypeoid"
" Sort Method: quicksort Max Memory: 49KB Avg Memory: 38KB (3 segments)"
" Rows out: Avg 48.7 rows x 3 workers. Max 63 rows (seg2) with 7.598 ms to first row, 7.603 ms to end."
" Executor memory: 58K bytes avg, 58K bytes max (seg0)."
" Work_mem used: 58K bytes avg, 58K bytes max (seg0). Workfile: (0 spilling)"
" -> Redistribute Motion 3:3 (slice2; segments: 3) (cost=0.00..865.45 rows=41 width=18)"
" Hash Key: info.userid"
" Rows out: Avg 48.7 rows x 3 workers at destination. Max 63 rows (seg2) with 7.178 ms to first row, 7.193 ms to end."
" -> Hash Join (cost=0.00..865.44 rows=41 width=18)"
" Hash Cond: info.tenantid = record.tenantid AND info.userid = record.userid"
" Rows out: Avg 48.7 rows x 3 workers. Max 50 rows (seg1) with 4.240 ms to first row, 4.995 ms to end."
" Executor memory: 7K bytes avg, 7K bytes max (seg0)."
" Work_mem used: 7K bytes avg, 7K bytes max (seg0). Workfile: (0 spilling)"
" (seg1) Hash chain length 1.0 avg, 2 max, using 141 of 131072 buckets."
" -> Dynamic Table Scan on info (dynamic scan id: 1) (cost=0.00..431.94 rows=258 width=10)"
" Filter: tenantid = 106996 AND approvalstatus::text = '4'::text"
" Rows out: Avg 772.7 rows x 3 workers. Max 783 rows (seg2) with 1.668 ms to first row, 1.836 ms to end."
" Partitions scanned: Avg 1.0 (out of 286) x 3 workers. Max 1 parts (seg0)."
" -> Hash (cost=100.00..100.00 rows=34 width=4)"
" Rows in: Avg 146.0 rows x 3 workers. Max 146 rows (seg0) with 5.202 ms to end, start offset by 34 ms."
" -> Partition Selector for info (dynamic scan id: 1) (cost=10.00..100.00 rows=34 width=4)"
" Rows out: Avg 146.0 rows x 3 workers. Max 146 rows (seg0) with 4.725 ms to first row, 5.104 ms to end."
" -> Broadcast Motion 3:3 (slice1; segments: 3) (cost=0.00..433.31 rows=144 width=22)"
" Rows out: Avg 146.0 rows x 3 workers at destination. Max 146 rows (seg0) with 3.820 ms to first row, 3.934 ms to end."
" -> Result (cost=0.00..433.25 rows=48 width=22)"
" Rows out: Avg 48.7 rows x 3 workers. Max 53 rows (seg2) with 4.080 ms to first row, 4.593 ms to end."
" -> Sequence (cost=0.00..433.25 rows=48 width=30)"
" Rows out: Avg 48.7 rows x 3 workers. Max 53 rows (seg2) with 4.078 ms to first row, 4.582 ms to end."
" -> Partition Selector for record (dynamic scan id: 2) (cost=10.00..100.00 rows=34 width=4)"
" Partitions selected: 1 (out of 128)"
" Rows out: 0 rows (seg0) with 0.018 ms to end."
" -> Dynamic Table Scan on record (dynamic scan id: 2) (cost=0.00..433.25 rows=48 width=30)"
" Filter: startdate >= '2018-05-24'::date AND startdate <= '2018-06-22'::date AND tenantid = 106996 AND iscurrentrecord = 1"
" Rows out: Avg 48.7 rows x 3 workers. Max 53 rows (seg2) with 4.060 ms to first row, 4.542 ms to end."
" Partitions scanned: Avg 1.0 (out of 128) x 3 workers. Max 1 parts (seg0)."
"Slice statistics:"
" (slice0) Executor memory: 880K bytes."
" (slice1) Executor memory: 1070K bytes avg x 3 workers, 1070K bytes max (seg0)."
" (slice2) Executor memory: 2844K bytes avg x 3 workers, 2844K bytes max (seg0). Work_mem: 7K bytes max."
" (slice3) Executor memory: 282K bytes avg x 3 workers, 293K bytes max (seg2). Work_mem: 58K bytes max."
" (slice4) Executor memory: 219K bytes avg x 3 workers, 219K bytes max (seg0). Work_mem: 58K bytes max."
"Statement statistics:"
" Memory used: 50176K bytes"
"Optimizer status: PQO version 2.62.0"
"Total runtime: 51.404 ms"
解决方案
Greenplum 数据库执行非常快速的顺序扫描;索引使用随机搜索模式来定位磁盘上的记录。
Greenplum 数据分布在各个段上,因此每个段扫描整个数据的一小部分以获得结果。使用表分区,要扫描的总数据可能会更小。
因为商业智能查询工作负载通常返回非常大的数据集,所以使用索引效率不高。
首先在不添加索引的情况下尝试您的查询工作负载..
您使用的是 ORCA 还是 LEGACY PLANNER?
关于时间问题——
你从哪里生成时间?远程客户端?还是从 PSQL 控制台?EXPLAIN ANALYZE 显示查询的实际处理时间。
您需要发布一个 EXPLAIN ANALYZE 才能对计划进行任何真正的分析。
推荐阅读
- jquery - 当 Postman 工作正常时,无法使用 jQuery 将数据发布到 Web api
- javascript - LONG:Promise 中的 .finally 在事件循环和任务调度方面有何不同?
- api - next.js 404 无法在 vercel 上加载资源 /api/
- c++ - 遍历n维向量c ++
- react-native - 对如何使用 Redux 和数据库感到困惑
- sql-server - 带有 WHERE Concat LIKE 的临时表
- javascript - 如何在 next.js 应用程序中从服务器端获取客户端的 IP 地址?
- python - pyspark 中的 Java
- json - 如何在 Jmeter 中创建动态请求
- python - 在自定义生成器中调整图像大小时出错