首页 > 解决方案 > Clickhouse 生成两个日期和间隔之间的日期数组

问题描述

我最近一直在使用 Clickhouse 提出很多问题,希望有人能帮我摆脱这种痛苦。

我的数据库中有多个日期,

前任。我的数据库在 5 月 (2020-05-01 ~ 2020-05-31) 有每天的日期

然后我想将 1 日和 31 日设置为开始日期和结束日期

并以一定的间隔获取数组中日期的 SELECT 结果

例如。

SELECT (some query about setting 2 timestamps as start / end and interval of 5 days)

那么预期的结果将是

2020-05-01
2020-05-05
2020-05-10
2020-05-15 . . .goes on till 30

我希望这个间隔是月、日、小时、分钟、秒、毫秒

但是据我对clickhouse的研究,我们真的不能在clickhouse中使用毫秒吗????

如果是这样,我是否必须将日期转换为 UInt64,然后做一些间隔技巧来获得结果 UInt64,然后将其转换为 datetime ???

请帮我 :(

输入数据示例:

start_at
2020-01-14 18:04:36.000
2020-01-14 19:07:48.000
2020-01-14 20:46:48.000
2020-01-14 23:21:12.000
2020-01-15 00:02:00.000
2020-01-15 03:36:00.000
2020-01-15 04:54:24.000
2020-01-15 08:04:00.000
2020-01-15 09:04:00.000
2020-01-15 10:04:00.000
2020-01-15 11:04:00.000
2020-01-15 14:04:00.000
2020-01-15 18:04:00.000
2020-01-16 11:04:00.000
2020-01-16 17:04:00.000
2020-01-16 17:04:00.000
2020-01-17 11:04:00.000
2020-01-17 18:04:00.000
2020-01-17 20:04:00.000
2020-01-18 01:04:00.000
2020-01-18 15:04:00.000

预期结果(例如间隔 2 天)

    time                        count
2020-01-14 18:04:36.000
2020-01-16 18:04:36.000
2020-01-18 18:04:36.000 

或间隔 1 天

         time                       count
2020-01-14 18:04:36.000
2020-01-15 18:04:36.000
2020-01-16 18:04:36.000
2020-01-17 18:04:36.000
2020-01-18 18:04:36.000

或 12 小时

             time                       count
2020-01-14 18:04:36.000
2020-01-15 06:04:36.000
2020-01-15 18:04:36.000
2020-01-16 06:04:36.000
2020-01-16 18:04:36.000
2020-01-17 06:04:36.000
2020-01-17 18:04:36.000
2020-01-18 06:04:36.000
2020-01-18 18:04:36.000

标签: clickhouse

解决方案


试试这个查询:

WITH
    toDateTime64('2020-01-14 18:04:36.000', 3) AS start_from,
    toUnixTimestamp64Milli(start_from) AS start_from_ts,
    ((12 * 60) * 60) * 1000 AS interval_msec
SELECT
  fromUnixTimestamp64Milli(toInt64(ts)) dt,
  count
FROM (    
  SELECT
      start_from_ts + interval_msec * interval_number AS ts,
      floor((toUnixTimestamp64Milli(start_at) - start_from_ts) / interval_msec) AS interval_number,
      count() AS count
  FROM 
  (
    /* emulate the test dataset */
    SELECT toDateTime64(dt, 3) AS start_at
      FROM (
        SELECT arrayJoin([
          ('2020-01-14 18:04:36.000'),
          ('2020-01-14 19:07:48.000'),
          ('2020-01-14 20:46:48.000'),
          ('2020-01-14 23:21:12.000'),
          ('2020-01-15 00:02:00.000'),
          ('2020-01-15 03:36:00.000'),
          ('2020-01-15 04:54:24.000'),
          ('2020-01-15 08:04:00.000'),
          ('2020-01-15 09:04:00.000'),
          ('2020-01-15 10:04:00.000'),
          ('2020-01-15 11:04:00.000'),
          ('2020-01-15 14:04:00.000'),
          ('2020-01-15 18:04:00.000'),
          ('2020-01-16 11:04:00.000'),
          ('2020-01-16 17:04:00.000'),
          ('2020-01-16 17:04:00.000'),
          ('2020-01-17 11:04:00.000'),
          ('2020-01-17 18:04:00.000'),
          ('2020-01-17 20:04:00.000'),
          ('2020-01-18 01:04:00.000'),
          ('2020-01-18 15:04:00.000')]) dt)
    )
  WHERE start_at >= start_from
  GROUP BY interval_number
  ORDER BY ts WITH FILL FROM toUnixTimestamp64Milli(toDateTime64('2020-01-14 18:04:36.000', 3)) TO toUnixTimestamp64Milli(toDateTime64('2020-01-18 18:04:36.000', 3)) STEP ((12 * 60) * 60) * 1000
  )

/* result
┌──────────────────────dt─┬─count─┐
│ 2020-01-14 18:04:36.000 │     7 │
│ 2020-01-15 06:04:36.000 │     6 │
│ 2020-01-15 18:04:36.000 │     0 │
│ 2020-01-16 06:04:36.000 │     3 │
│ 2020-01-16 18:04:36.000 │     0 │
│ 2020-01-17 06:04:36.000 │     2 │
│ 2020-01-17 18:04:36.000 │     2 │
│ 2020-01-18 06:04:36.000 │     1 │
└─────────────────────────┴───────┘
*/

上面的查询计算间隔 12 hours的值。

要将其应用于其他间隔需要修改:


推荐阅读