clickhouse - 使用 ClickHouse 折叠重叠的时间间隔
问题描述
我阅读了类似的问题,并且可以通过使用窗口功能使其工作,但是,由于 ClickHouse 似乎不支持它们,我正在寻找替代解决方案。
给定像 (1, 5), (2, 3), (3, 8), (10, 15) 这样的时间间隔,我想将重叠的间隔“合并”成单个的间隔。在我的示例中,它将是:
(1, 8) 和 (10, 15)。
任何指针表示赞赏!谢谢!
解决方案
如果此函数已与任意 lambda 一起使用,则 arrayReduce很容易解决此任务。虽然这不存在,但请尝试通过可用的方式解决问题。
SELECT
intervals,
arraySort(x -> x, intervals) sortedIntervals,
/* try to merge each interval with precede ones */
arrayMap((x, index) -> index != 1
? (arrayReduce(
'min',
arrayMap(
i -> sortedIntervals[i + 1].1,
/* get indexes of intervals that can be merged with the current one (index is zero-based) */
arrayFilter(
i -> x.1 <= sortedIntervals[i + 1].2 AND x.2 >= sortedIntervals[i + 1].1,
range(index)))),
arrayReduce(
'max',
arrayMap(
i -> sortedIntervals[i + 1].2,
/* get indexes of intervals that can be merged with the current one (index is zero-based) */
arrayFilter(
i -> x.1 <= sortedIntervals[i + 1].2 AND x.2 >= sortedIntervals[i + 1].1,
range(index)))))
: x,
sortedIntervals,
arrayEnumerate(sortedIntervals)) rawResult,
/* filter out intervals nested to other ones */
arrayFilter(
(x, index) -> index == length(rawResult) OR x.1 != rawResult[index + 1].1,
rawResult,
arrayEnumerate(rawResult)) result
FROM
(
SELECT [(1, 5), (2, 3), (3, 8), (10, 15)] intervals
UNION ALL
SELECT [(2, 4), (1, 3), (3, 6), (12, 14), (7, 7), (13, 16), (9, 9), (8, 9), (10, 15)]
UNION ALL
SELECT [(20, 22), (18, 18), (16, 21), (1, 8), (2, 9), (3, 5), (10, 12), (11, 13), (14, 15)]
UNION ALL
SELECT []
UNION ALL
SELECT [(1, 11)]
)
FORMAT Vertical;
/*
Row 1:
──────
intervals: [(2,4),(1,3),(3,6),(12,14),(7,7),(13,16),(9,9),(8,9),(10,15)]
sortedIntervals: [(1,3),(2,4),(3,6),(7,7),(8,9),(9,9),(10,15),(12,14),(13,16)]
rawResult: [(1,3),(1,4),(1,6),(7,7),(8,9),(8,9),(10,15),(10,15),(10,16)]
result: [(1,6),(7,7),(8,9),(10,16)]
Row 2:
──────
intervals: [(1,5),(2,3),(3,8),(10,15)]
sortedIntervals: [(1,5),(2,3),(3,8),(10,15)]
rawResult: [(1,5),(1,5),(1,8),(10,15)]
result: [(1,8),(10,15)]
Row 3:
──────
intervals: [(20,22),(18,18),(16,21),(1,8),(2,9),(3,5),(10,12),(11,13),(14,15)]
sortedIntervals: [(1,8),(2,9),(3,5),(10,12),(11,13),(14,15),(16,21),(18,18),(20,22)]
rawResult: [(1,8),(1,9),(1,9),(10,12),(10,13),(14,15),(16,21),(16,21),(16,22)]
result: [(1,9),(10,13),(14,15),(16,22)]
Row 4:
──────
intervals: []
sortedIntervals: []
rawResult: []
result: []
Row 5:
──────
intervals: [(1,11)]
sortedIntervals: [(1,11)]
rawResult: [(1,11)]
result: [(1,11)]
*/
推荐阅读
- node.js - 在heroku上托管时显示json数据
- python - 我的 Tflite 模型在 jupyter notebook 中完美运行。但是,当我尝试在 Android Studio 中运行它时显示错误的输出
- java - InputMismatchException 帮助,如何覆盖?
- linux - 当gdb附加到tracee时tracee会得到SIGSTOP做什么
- pytorch - Google Colab 最近引发错误 ModuleNotFoundError: No module named 'google.cloud.storage.retry'
- python - 从列类型对象 pandas 中删除为空的行
- php - Laravel 控制器存储以驼峰格式接收的数据
- python - 在 Plotly Dash 中结合 DateRangePicker 和 Dropdown
- spring-webflux - Spring Webflux HttpResponse
- java - 为什么 @OnDelete(action = OnDeleteAction.CASCADE) 在删除 InheritanceType.JOINED 时不删除父表条目