首页 > 解决方案 > 如果我在clickhouse中只有设备的在线和离线事件记录,如何汇总每小时的在线计数

问题描述

如果我有设备的在线和离线事件记录。

CREATE TABLE IF NOT EXISTS connect_log (
 `deviceId` Int64,
 `eventTime` Datetime() ,
 online Int8
 )  ENGINE = MergeTree
 ORDER BY (`deviceId`,`eventTime`);

如何编写一个汇总每小时在线设备数量的物化视图。

在线计数不仅是在线事件的计数,它还必须包括该小时内曾经在线的所有设备(最后一个事件不是离线即使最后一个在线事件是在很多天之前记录的或者最后一个离线事件是在这个小时)

测试用例

插入1

INSERT into connect_log values 
(1,'2020-07-30 10:01:00',1),
(2,'2020-07-30 10:01:0',1),
(3,'2020-07-30 11:01:00',1);

结果1:

............................
2020-07-30 10:00:00| 2
............................
2020-07-30 11:00:00| 3
............................

然后插入2

INSERT into connect_log values 
(1,'2020-07-30 11:51:00',0),
(2,'2020-07-30 12:01:0',0),
(3,'2020-07-30 13:51:00',0);

结果1:

............................
2020-07-30 10:00:00|  2
............................
2020-07-30 11:00:00| 3
............................
2020-07-30 12:00:00| 2
............................
2020-07-30 13:00:00| 1
............................

标签: clickhouse

解决方案


试试这个方法:

CREATE MATERIALIZED VIEW connect_log_hourly_mv
ENGINE = AggregatingMergeTree()
PARTITION BY tuple()
ORDER BY (hourlyEventTime) 
AS
SELECT  
  toStartOfHour(eventTime) AS hourlyEventTime, 
  uniqExactIfState(deviceId, online = 1) AS onlineDeviceUniqCountState
  /*
  .. other aggregates ..
  */
FROM connect_log
GROUP BY hourlyEventTime;

查询以获取聚合结果:

SELECT 
    hourlyEventTime, 
    uniqExactIfMerge(onlineDeviceUniqCountState) onlineDeviceUniqCount
FROM connect_log_hourly_mv
GROUP BY hourlyEventTime
HAVING onlineDeviceUniqCount > 0
ORDER BY hourlyEventTime;

更新

物化视图计算用于计算最终结果的初步聚合(一小时内最早事件的状态):

CREATE MATERIALIZED VIEW connect_log_hourly_mv
ENGINE = AggregatingMergeTree()
PARTITION BY tuple()
ORDER BY (hourlyEventTime) 
AS
SELECT  
  toStartOfHour(eventTime) AS hourlyEventTime, 
  deviceId,
  argMinState(online, eventTime) AS firstEventStatusState
FROM connect_log
GROUP BY hourlyEventTime, deviceId;

询问:

SELECT 
  hourlyEventTime, 
  countIf(deviceId, firstEventStatus = 1) AS uniq_count
FROM (
  SELECT 
    hourlyEventTime, 
    deviceId, 
    argMinMerge(firstEventStatusState) firstEventStatus
  FROM connect_log_hourly_mv
  GROUP BY hourlyEventTime, deviceId)
GROUP BY hourlyEventTime
ORDER BY hourlyEventTime

推荐阅读