moving-average - Clickhouse移动平均线
问题描述
输入:点击屋
表 A business_dttm (datetime) 金额 (float)
我需要在每个 business_dttm 上计算 15 分钟(或最后 3 条记录)的移动总和
例如
amount business_dttm moving sum
0.3 2018-11-19 13:00:00
0.3 2018-11-19 13:05:00
0.4 2018-11-19 13:10:00 1
0.5 2018-11-19 13:15:00 1.2
0.6 2018-11-19 13:15:00 1.5
0.7 2018-11-19 13:20:00 1.8
0.8 2018-11-19 13:25:00 2.1
0.9 2018-11-19 13:25:00 2.4
0.5 2018-11-19 13:30:00 2.2
不幸的是,我们在 Clickhouse 中没有窗口函数和无条件加入
如果没有交叉连接和条件,我该怎么做?
解决方案
如果窗口大小非常小,你可以做这样的事情
SELECT
sum(window.2) AS amount,
max(dttm) AS business_dttm,
sum(amt) AS moving_sum
FROM
(
SELECT
arrayJoin([(rowNumberInAllBlocks(), amount), (rowNumberInAllBlocks() + 1, 0), (rowNumberInAllBlocks() + 2, 0)]) AS window,
amount AS amt,
business_dttm AS dttm
FROM
(
SELECT
amount,
business_dttm
FROM A
ORDER BY business_dttm
)
)
GROUP BY window.1
HAVING count() = 3
ORDER BY window.1;
前两行被忽略,因为 ClickHouse 不会将聚合折叠为 null。您可以稍后添加它们。
更新:
仍然可以计算任意窗口大小的移动总和。根据需要调整window_size
(本示例为 3)。
-- Note, rowNumberInAllBlocks is incorrect if declared inside with block due to being stateful
WITH
(
SELECT arrayCumSum(groupArray(amount))
FROM
(
SELECT
amount
FROM A
ORDER BY business_dttm
)
) AS arr,
3 AS window_size
SELECT
amount,
business_dttm,
if(rowNumberInAllBlocks() + 1 < window_size, NULL, arr[rowNumberInAllBlocks() + 1] - arr[rowNumberInAllBlocks() + 1 - window_size]) AS moving_sum
FROM
(
SELECT
amount,
business_dttm
FROM A
ORDER BY business_dttm
)
或者这个变种
SELECT
amount,
business_dttm,
moving_sum
FROM
(
WITH 3 AS window_size
SELECT
groupArray(amount) AS amount_arr,
groupArray(business_dttm) AS business_dttm_arr,
arrayCumSum(amount_arr) AS amount_cum_arr,
arrayMap(i -> if(i < window_size, NULL, amount_cum_arr[i] - amount_cum_arr[(i - window_size)]), arrayEnumerate(amount_cum_arr)) AS moving_sum_arr
FROM
(
SELECT *
FROM A
ORDER BY business_dttm ASC
)
)
ARRAY JOIN
amount_arr AS amount,
business_dttm_arr AS business_dttm,
moving_sum_arr AS moving_sum
公平的警告,这两种方法都远非最佳,但它展示了 ClickHouse 超越 SQL 的独特功能。
推荐阅读
- android - api 监听器太多。有什么办法可以整齐的组织起来吗?
- node.js - 在 git bash 中使用 utf-8 的 npx 包
- google-cloud-platform - GCP 通知通道
- node.js - 无法从 React-Node 中的上传文件夹访问图像
- javascript - 在 React 应用程序中从 s3 presigned url timeout 捕获 403 错误
- ghostscript - 我可以让 GhostScript 使用超过 2 GB 的内存吗?
- macos - macos openvpn on demand NEAgentErrorDoamin
- javascript - 反向转换不适用于导航
- python - Selenium 的问题,尤其是优化问题
- spring - 导入org.springframework 无法解决