首页 > 解决方案 > I want to reduce my SQL Query on big Query

问题描述

I want to fetch data from bigQuery database but I get an error =>The query is too large. The maximum query length is 256.000K characters, including comments and white space characters. i will show a part of query which i repeated 21 times

WITH data AS 
(
 SELECT
 IFNULL(department, 'UNKNOWN_DEPARTMENT') AS dept,


> 'C7s'

 AS campus,
 COUNTIF(task.taskRaised.raisedAt.milliSeconds BETWEEN 1542565800000 AND 1543170599999) AS taskCount_0,
 COUNTIF(task.taskRaised.raisedAt.milliSeconds BETWEEN 1542565800000 AND 1543170599999 

 AND IF (task.deadline.currentEscalationLevel NOT IN 
 (
 'ESC_ACKNOWLEDGEMENT'
 )
, task.deadline.currentEscalationLevel, 'NOT_ESCALATED') NOT IN 
 (
 'NOT_ESCALATED'
 )
) AS escCount_0,
 COUNTIF(task.taskRaised.raisedAt.milliSeconds BETWEEN 1541961000000 AND 1542565799999) AS taskCount_1,
 COUNTIF(task.taskRaised.raisedAt.milliSeconds BETWEEN 1541961000000 AND 1542565799999 
 AND IF (task.deadline.currentEscalationLevel NOT IN 
 (
 'ESC_ACKNOWLEDGEMENT'
 )
, task.deadline.currentEscalationLevel, 'NOT_ESCALATED') NOT IN 
 (
 'NOT_ESCALATED'
 )
) AS escCount_1,
 COUNTIF(task.taskRaised.raisedAt.milliSeconds BETWEEN 1541356200000 AND 1541960999999) AS taskCount_2,
 COUNTIF(task.taskRaised.raisedAt.milliSeconds BETWEEN 1541356200000 AND 1541960999999 
 AND IF (task.deadline.currentEscalationLevel NOT IN 
 (
 'ESC_ACKNOWLEDGEMENT'
 )
, task.deadline.currentEscalationLevel, 'NOT_ESCALATED') NOT IN 
 (
 'NOT_ESCALATED'
 )
) AS escCount_2 
 FROM

>  `nsimplbigquery.TaskManagement.C7s_*`

 WHERE
 _TABLE_SUFFIX IN 
 (
 '2018_47_11',
 '2018_45_11',
 '2018_46_11'
 )
 AND IFNULL(department, 'UNKNOWN_DEPARTMENT') IN 
 (
 'ENGG_AND_MAINT_DEPARTMENT',
 'FNB_DEPARTMENT',
 'TELECOM_DEPARTMENT',
 'IT_DEPARTMENT',
 'BILLING_AND_INSURANCE',
 'HOUSEKEEPING_DEPARTMENT'
 )
 AND task.taskRaised.raisedAt.milliSeconds BETWEEN 1541356200000 AND 1543170599999 
 GROUP BY
 dept
)
,
mainQuery AS 
(
 SELECT
 dept,
 campus,
 SUM(taskCount_0) AS taskCount_0,
 SUM(escCount_0) AS escCount_0,
 CAST(SAFE_DIVIDE(SUM(escCount_0), SUM(taskCount_0)) * 10000 AS INT64) AS escPerc_0,
 SUM(taskCount_1) AS taskCount_1,
 SUM(escCount_1) AS escCount_1,
 CAST(SAFE_DIVIDE(SUM(escCount_1), SUM(taskCount_1)) * 10000 AS INT64) AS escPerc_1,
 SUM(taskCount_2) AS taskCount_2,
 SUM(escCount_2) AS escCount_2,
 CAST(SAFE_DIVIDE(SUM(escCount_2), SUM(taskCount_2)) * 10000 AS INT64) AS escPerc_2 
 FROM
 data 
 GROUP BY
 ROLLUP (campus, dept)
)
SELECT
 dept,
 campus,
 taskCount_0,
 escCount_0,
 escPerc_0,
 taskCount_1,
 escCount_1,
 escPerc_1,
 taskCount_2,
 escCount_2,
 escPerc_2 
FROM
 mainQuery 
WHERE
 campus IS NOT NULL 
ORDER BY
 CASE
 WHEN
 dept IS NULL 
 THEN
 1 
 ELSE
 0 
 END
 ASC, dept ASC, campus ASC;

This is the query which I repeat so many times so can due to I have so many ids Where C7s i changed with following ids

C7z, C7u, H0B, IDp, ITR, C7i, C7j, C7k, C7l, C7m, C7o, C71, C7t, F6qZ, C7w, GIui, Fs, C70, C7p, C7r if you see my explainantion i quote a line this nsimplbigquery.TaskManagement.C7s_* so at next query the table names is changed like

nsimplbigquery.TaskManagement.C7z_*

标签: sqlgoogle-bigqueryquery-optimization

解决方案


与其重复整个 SELECT 语句 21 次,不如使用以下方法。您将在 _TABLE_SUFFIX 的列表中有 3x21=63 个条目 - 但您将能够解决查询长度的问题

FROM `nsimplbigquery.TaskManagement.*` 
WHERE _TABLE_SUFFIX IN (
  'C7s_2018_47_11',
  'C7s_2018_45_11',
  'C7s_2018_46_11',
  'C7z_2018_47_11',
  'C7z_2018_45_11',
  'C7z_2018_46_11',
  'C7u_2018_47_11',
  'C7u_2018_45_11',
  'C7u_2018_46_11',
  ...
  ...
  ... 
  'C7r_2018_47_11',
  'C7r_2018_45_11',
  'C7r_2018_46_11',
  )

推荐阅读