首页 > 解决方案 > 如何对不同表和不同过滤器的多个计数求和

问题描述

我想在不同的表中获得多个,每个box_id表中不同但具有相同的,datehoursumcount idstatusbox_id

例子

table 1
(filter by status = finished)


id   box_id              date                status
i     20      2019-01-01 01:00:00.000 UTC    finished
2     21      2019-01-01 02:00:00.000 UTC    finished
3     21      2019-01-01 01:00:00.000 UTC    unfinished


table 2
(filter by status = start)

id   box_id              date                status
i     21      2019-01-01 01:00:00.000 UTC    start
2     22      2019-01-01 02:00:00.000 UTC    end
3     23      2019-01-01 01:00:00.000 UTC    start
4     24      2019-01-01 01:00:00.000 UTC    start


table 3
(filter by status = close)


id   box_id              date                status
i     21      2019-01-01 03:00:00.000 UTC    close
2     22      2019-01-01 02:00:00.000 UTC    end
3     24      2019-01-01 01:00:00.000 UTC    close

result that i want:

box_id      date        hour       count
20         2019-01-01     1          1
21         2019-01-01     1          1
21         2019-01-01     2          1
21         2019-01-01     3          1
23         2019-01-01     1          1
24         2019-01-01     1          2

这是适用于表 1 的查询:我如何在 1 个表中获取所有内容?

select box_id, 
date(date_update), 
EXTRACT(hour FROM date_update) as hourly, 
count(id)
from table1
where status = "finished"
group by box_id, date(date_update), EXTRACT(hour FROM date_update)

格式小时 = 0 - 23

标签: mysqlsqlcountgoogle-bigquerysum

解决方案


假设您的date字段是 TIMESTAMP 数据类型 - 以下是 BigQuery 标准 SQL

#standardSQL
SELECT box_id, date, hour, COUNT(1) cnt
FROM (
  SELECT box_id, DATE(date) date, EXTRACT(HOUR FROM date) hour 
  FROM `project.dataset.table1` WHERE status = 'finished' UNION ALL
  SELECT box_id, DATE(date) date, EXTRACT(HOUR FROM date) hour 
  FROM `project.dataset.table2` WHERE status = 'start' UNION ALL
  SELECT box_id, DATE(date) date, EXTRACT(HOUR FROM date) hour 
  FROM `project.dataset.table3` WHERE status = 'close'
)
GROUP BY box_id, date, hour

您可以使用您问题中的示例/虚拟数据进行测试,使用上面的示例,如下例所示

#standardSQL
WITH `project.dataset.table1` AS (
  SELECT 1 id, 20 box_id, TIMESTAMP '2019-01-01 01:00:00.000 UTC'date, 'finished' status UNION ALL
  SELECT 2, 21, '2019-01-01 02:00:00.000 UTC', 'finished' UNION ALL
  SELECT 3, 21, '2019-01-01 01:00:00.000 UTC', 'unfinished' 
), `project.dataset.table2` AS (
  SELECT 1 id, 21 box_id, TIMESTAMP '2019-01-01 01:00:00.000 UTC' date, 'start' status UNION ALL
  SELECT 2, 22, '2019-01-01 02:00:00.000 UTC', 'end' UNION ALL
  SELECT 3, 23, '2019-01-01 01:00:00.000 UTC', 'start' UNION ALL
  SELECT 4, 24, '2019-01-01 01:00:00.000 UTC', 'start' 
), `project.dataset.table3` AS (
  SELECT 1 id, 21 box_id, TIMESTAMP '2019-01-01 03:00:00.000 UTC' date, 'close' status UNION ALL
  SELECT 2, 22, '2019-01-01 02:00:00.000 UTC', 'end' UNION ALL
  SELECT 3, 24, '2019-01-01 01:00:00.000 UTC', 'close' 
)
SELECT box_id, date, hour, COUNT(1) cnt
FROM (
  SELECT box_id, DATE(date) date, EXTRACT(HOUR FROM date) hour 
  FROM `project.dataset.table1` WHERE status = 'finished' UNION ALL
  SELECT box_id, DATE(date) date, EXTRACT(HOUR FROM date) hour 
  FROM `project.dataset.table2` WHERE status = 'start' UNION ALL
  SELECT box_id, DATE(date) date, EXTRACT(HOUR FROM date) hour 
  FROM `project.dataset.table3` WHERE status = 'close'
)
GROUP BY box_id, date, hour
-- ORDER BY box_id, date, hour   

结果

Row box_id  date        hour    cnt  
1   20      2019-01-01  1       1    
2   21      2019-01-01  1       1    
3   21      2019-01-01  2       1    
4   21      2019-01-01  3       1    
5   23      2019-01-01  1       1    
6   24      2019-01-01  1       2      

以下是相同的稍微重构的版本(显然具有相同的输出)

#standardSQL
SELECT box_id, DATE(date) date, EXTRACT(HOUR FROM date) hour, 
  COUNTIF(
    (t = 1 AND status = 'finished') OR
    (t = 2 AND status = 'start') OR
    (t = 3 AND status = 'close') 
  ) cnt
FROM (
  SELECT 1 t, * FROM `project.dataset.table1` UNION ALL
  SELECT 2, * FROM `project.dataset.table2` UNION ALL
  SELECT 3, * FROM `project.dataset.table3` 
)
GROUP BY box_id, date, hour
HAVING cnt > 0

或者

#standardSQL
SELECT box_id, DATE(date) date, EXTRACT(HOUR FROM date) hour, COUNT(1) cnt
FROM (
  SELECT * FROM `project.dataset.table1` WHERE status = 'finished' UNION ALL
  SELECT * FROM `project.dataset.table2` WHERE status = 'start' UNION ALL
  SELECT * FROM `project.dataset.table3` WHERE status = 'close'
)
GROUP BY box_id, date, hour

推荐阅读