首页 > 解决方案 > 使用 ClickHouse 折叠重叠的时间间隔

问题描述

我阅读了类似的问题,并且可以通过使用窗口功能使其工作,但是,由于 ClickHouse 似乎不支持它们,我正在寻找替代解决方案。

给定像 (1, 5), (2, 3), (3, 8), (10, 15) 这样的时间间隔,我想将重叠的间隔“合并”成单个的间隔。在我的示例中,它将是:

(1, 8) 和 (10, 15)。

任何指针表示赞赏!谢谢!

标签: clickhouse

解决方案


如果此函数已与任意 lambda 一起使用,则 arrayReduce很容易解决此任务。虽然这不存在,但请尝试通过可用的方式解决问题。

SELECT
    intervals,

    arraySort(x -> x, intervals) sortedIntervals,

    /* try to merge each interval with precede ones */
    arrayMap((x, index) -> index != 1
        ? (arrayReduce(
            'min', 
            arrayMap(
              i -> sortedIntervals[i + 1].1, 
              /* get indexes of intervals that can be merged with the current one (index is zero-based) */              
              arrayFilter(
                i -> x.1 <= sortedIntervals[i + 1].2 AND x.2 >= sortedIntervals[i + 1].1, 
                range(index)))),
          arrayReduce(
            'max', 
            arrayMap(
              i -> sortedIntervals[i + 1].2,  
              /* get indexes of intervals that can be merged with the current one (index is zero-based) */              
              arrayFilter(
                i -> x.1 <= sortedIntervals[i + 1].2 AND x.2 >= sortedIntervals[i + 1].1, 
                range(index)))))
        : x,
      sortedIntervals, 
      arrayEnumerate(sortedIntervals)) rawResult,

    /* filter out intervals nested to other ones */
    arrayFilter(
      (x, index) -> index == length(rawResult) OR x.1 != rawResult[index + 1].1,
      rawResult, 
      arrayEnumerate(rawResult)) result
FROM
(
    SELECT [(1, 5), (2, 3), (3, 8), (10, 15)] intervals
    UNION ALL
    SELECT [(2, 4), (1, 3), (3, 6), (12, 14), (7, 7), (13, 16), (9, 9), (8, 9), (10, 15)]
    UNION ALL
    SELECT [(20, 22), (18, 18), (16, 21), (1, 8), (2, 9), (3, 5), (10, 12), (11, 13), (14, 15)]
    UNION ALL
    SELECT []
    UNION ALL 
    SELECT [(1, 11)]
)
FORMAT Vertical;

/*
Row 1:
──────
intervals:       [(2,4),(1,3),(3,6),(12,14),(7,7),(13,16),(9,9),(8,9),(10,15)]
sortedIntervals: [(1,3),(2,4),(3,6),(7,7),(8,9),(9,9),(10,15),(12,14),(13,16)]
rawResult:       [(1,3),(1,4),(1,6),(7,7),(8,9),(8,9),(10,15),(10,15),(10,16)]
result:          [(1,6),(7,7),(8,9),(10,16)]

Row 2:
──────
intervals:       [(1,5),(2,3),(3,8),(10,15)]
sortedIntervals: [(1,5),(2,3),(3,8),(10,15)]
rawResult:       [(1,5),(1,5),(1,8),(10,15)]
result:          [(1,8),(10,15)]

Row 3:
──────
intervals:       [(20,22),(18,18),(16,21),(1,8),(2,9),(3,5),(10,12),(11,13),(14,15)]
sortedIntervals: [(1,8),(2,9),(3,5),(10,12),(11,13),(14,15),(16,21),(18,18),(20,22)]
rawResult:       [(1,8),(1,9),(1,9),(10,12),(10,13),(14,15),(16,21),(16,21),(16,22)]
result:          [(1,9),(10,13),(14,15),(16,22)]

Row 4:
──────
intervals:       []
sortedIntervals: []
rawResult:       []
result:          []

Row 5:
──────
intervals:       [(1,11)]
sortedIntervals: [(1,11)]
rawResult:       [(1,11)]
result:          [(1,11)]
*/

推荐阅读