clickhouse - 为什么不在合并树节点上执行数组连接
问题描述
选择 uniq(uid,sid) 作为值,l.1 从 ,l.2 到 从 ( 选择 uid,sid,s_t 从 ( 选择 distinct_id 作为 uid, arraySort((x)->x.1,groupArray(tuple(toUnixTimestamp(ums_ts_),toString(event_id)))) as cur, arrayDifference((x)->x.1,cur) 作为 cur_diff, 数组PushBack( 数组过滤器( (x,y)->y>1800, 数组枚举(cur_diff), cur_diff ), 长度(当前)+1 ) 作为 cur_split, arrayFilter((x)->长度(x)>0, arrayMap((x)->arrayMap((x)->x.2,arraySlice(x,arrayFirstIndex((y)->y.2='1301',x))), arrayMap((x,y)->arraySlice(cur, multiIf(y==1,1,cur_split[y-1]),multiIf(y==1,cur_split[y]-1,cur_split[y]-cur_split[y-1])),cur_split,arrayEnumerate(cur_split )))) 作为 t, arrayMap((x)->arrayMap((y,z,q)->tuple(concat(toString(y),'_$$_',z),concat(multiIf(y==length(arrayEnumerate(x) ),'',toString(y+1)),'_$$_',q)),arrayEnumerate(x),x,arrayPushBack(arrayPopFront(x),'_waste')),t) 作为 tx 来自 event_data.event_wos_p15 其中 event_id in (1301,1310,1303,1305,1429) and event_date>='2020-03-01' and event_date <='2020-03-31' group by distinct_id ) array join tx as s_t,arrayEnumerate(tx) as sid ) array join s_t as l group by from ,to
检查 system.query_log 表发现在分布式节点上执行了数组连接。为什么不在合并树节点上执行数组连接?
合并树节点 query_log
type: QueryFinish
event_date: 2020-04-27
event_time: 2020-04-27 15:34:54
query_start_time: 2020-04-27 15:34:53
query_duration_ms: 628
read_rows: 4955184
read_bytes: 355066855
written_rows: 0
written_bytes: 0
result_rows: 76798
result_bytes: 4636864
memory_usage: 660752320
query: SELECT distinct_id AS uid, arrayMap(x -> arrayMap((y, z, q) -> (concat(toString(y), '_$$_', z), concat(multiIf(y = length(arrayEnumerate(x)), '', toString(y + 1)), '_$$_', q)), arrayEnumerate(x), x, arrayPushBack(arrayPopFront(x), '_waste')), arrayFilter(x -> (length(x) > 0), arrayMap(x -> arrayMap(x -> (x.2), arraySlice(x, arrayFirstIndex(y -> ((y.2) = '1301'), x))), arrayMap((x, y) -> arraySlice(arraySort(x -> (x.1), groupArray((toUnixTimestamp(ums_ts_), toString(event_id)))) AS cur, multiIf(y = 1, 1, (arrayPushBack(arrayFilter((x, y) -> (y > 1800), arrayEnumerate(arrayDifference(x -> (x.1), cur) AS cur_diff), cur_diff), length(cur) + 1) AS cur_split)[y - 1]), multiIf(y = 1, (cur_split[y]) - 1, (cur_split[y]) - (cur_split[y - 1]))), cur_split, arrayEnumerate(cur_split)))) AS t) AS tx
FROM event_data.event_wos_p15 WHERE (event_id IN (1301, 1310, 1303, 1305, 1429)) AND (event_date >= '2020-03-01') AND (event_date <= '2020-03-31') GROUP BY distinct_id
解决方案
select
from (
select xxx,
from distributed_table
group by
)
只有查询的内部部分from distributed_table
将在 shads 上(在 MergeTree 表上)执行,所有其他部分( )将在启动器节点上执行。
推荐阅读
- vba - VBA 命令 CopyDatabaseFile 命中运行时错误“2046”
- angular - 嵌套一个组件,需要从该组件获取一个属性值给父级并设置另一个
- javascript - 如何使用 React-Redux 从注册页面将数据更新到 MongoDB?
- database - 如何获取 DolphinDB 中所有数据节点的别名
- c++ - 使用 strcmp() 比较 c 样式字符数组时不匹配
- python - 对python中用户提供的列表进行排序?
- javascript - Jquery自动触发点击事件不起作用
- android - Android 应用上传图片和服务器上的 URL
- python-3.x - 使用 pythonwekawrapper3 后获取与结果相关的特定值
- xcode - React Native 和 xcode - 命令 CodeSign 失败,退出代码为非零