首页 > 解决方案 > 为什么不在合并树节点上执行数组连接

问题描述

选择 uniq(uid,sid) 作为值,l.1 从 ,l.2 到
从 (
选择 uid,sid,s_t
从 (
选择
distinct_id 作为 uid,
arraySort((x)->x.1,groupArray(tuple(toUnixTimestamp(ums_ts_),toString(event_id)))) as cur,
arrayDifference((x)->x.1,cur) 作为 cur_diff,
数组PushBack(
数组过滤器(
(x,y)->y>1800,
数组枚举(cur_diff),
cur_diff
),
长度(当前)+1
) 作为 cur_split,
arrayFilter((x)->长度(x)>0,
arrayMap((x)->arrayMap((x)->x.2,arraySlice(x,arrayFirstIndex((y)->y.2='1301',x))),
arrayMap((x,y)->arraySlice(cur,
multiIf(y==1,1,cur_split[y-1]),multiIf(y==1,cur_split[y]-1,cur_split[y]-cur_split[y-1])),cur_split,arrayEnumerate(cur_split )))) 作为 t,
arrayMap((x)->arrayMap((y,z,q)->tuple(concat(toString(y),'_$$_',z),concat(multiIf(y==length(arrayEnumerate(x) ),'',toString(y+1)),'_$$_',q)),arrayEnumerate(x),x,arrayPushBack(arrayPopFront(x),'_waste')),t) 作为 tx
来自 event_data.event_wos_p15 其中 event_id in (1301,1310,1303,1305,1429) and event_date>='2020-03-01' and event_date <='2020-03-31' group by distinct_id
) array join tx as s_t,arrayEnumerate(tx) as sid
) array join s_t as l group by from ,to
检查 system.query_log 表发现在分布式节点上执行了数组连接。为什么不在合并树节点上执行数组连接?

合并树节点 query_log


type:                 QueryFinish
event_date:           2020-04-27
event_time:           2020-04-27 15:34:54
query_start_time:     2020-04-27 15:34:53
query_duration_ms:    628
read_rows:            4955184
read_bytes:           355066855
written_rows:         0
written_bytes:        0
result_rows:          76798
result_bytes:         4636864
memory_usage:         660752320
query:                SELECT distinct_id AS uid, arrayMap(x -> arrayMap((y, z, q) -> (concat(toString(y), '_$$_', z), concat(multiIf(y = length(arrayEnumerate(x)), '', toString(y + 1)), '_$$_', q)), arrayEnumerate(x), x, arrayPushBack(arrayPopFront(x), '_waste')), arrayFilter(x -> (length(x) > 0), arrayMap(x -> arrayMap(x -> (x.2), arraySlice(x, arrayFirstIndex(y -> ((y.2) = '1301'), x))), arrayMap((x, y) -> arraySlice(arraySort(x -> (x.1), groupArray((toUnixTimestamp(ums_ts_), toString(event_id)))) AS cur, multiIf(y = 1, 1, (arrayPushBack(arrayFilter((x, y) -> (y > 1800), arrayEnumerate(arrayDifference(x -> (x.1), cur) AS cur_diff), cur_diff), length(cur) + 1) AS cur_split)[y - 1]), multiIf(y = 1, (cur_split[y]) - 1, (cur_split[y]) - (cur_split[y - 1]))), cur_split, arrayEnumerate(cur_split)))) AS t) AS tx 
FROM event_data.event_wos_p15 WHERE (event_id IN (1301, 1310, 1303, 1305, 1429)) AND (event_date >= '2020-03-01') AND (event_date <= '2020-03-31') GROUP BY distinct_id

标签: clickhouse

解决方案


select 
from ( 
   select xxx, 
   from distributed_table
   group by
   )

只有查询的内部部分from distributed_table将在 shads 上(在 MergeTree 表上)执行,所有其他部分( )将在启动器节点上执行。


推荐阅读