首页 > 解决方案 > 在大查询中连接多个表

问题描述

我想在 BigQuery 中加入多个表,但在 BigQuery 中加入多个表的解决方案并没有帮助我获得所需的输出。

我的出发点如下。我正在创建 5 个单独的表格,这些表格显示特定页面可能的每个评级值。在此处查看示例输出:

原始表

该表是通过以下方式创建的:

#standardSQL
  CREATE TEMPORARY FUNCTION tables_in_range(suffix STRING) AS (suffix BETWEEN (
    SELECT
      FORMAT_DATE('%y%m%d',
        DATE('2018-06-01')))
    AND (
    SELECT
      FORMAT_DATE('%y%m%d',
        DATE('2018-06-30'))));

SELECT
  h.page.pagePath AS page,
  Count(h.eventInfo.eventLabel)as five_star
FROM
  `table.ga_sessions_20*` AS t,
  t.hits AS h
WHERE
  h.eventInfo.eventAction='rating'
  AND h.eventInfo.eventLabel ='5'
  AND tables_in_range(_TABLE_SUFFIX)
  AND REGEXP_CONTAINS(h.page.pagePath,
    r'/xyz/')
  AND h.type='EVENT'
group by 1

当按照此处所述加入表时,在 bigquery 中加入多个表,不幸的是,我没有得到预期的结果。相反,联接只查看所有 5 个表共有的页面 - 这意味着这些页面在 1-5 的五个可能值中的每一个中都有一个评级。请参阅下面的示例输出。 联合表结果

select
five_star.page as page,
five_star.five_star as five_star,
four_star.four_star as four_star,
three_star.three_star as three_star,
two_star.two_star as two_star,
one_star.one_star as one_star
from five_star
join four_star using (page)
join three_star using (page)
join two_star using (page)
JOIN one_star using (page)

我想通过我的加入来实现一个像这样的表: 期望的输出。我看到的问题是,如果一个页面没有收到某个评级,它将不会加入查询 atm。不幸的是,我无法通过 Union all、Cross Join 或 left join 找到解决方案,所以我非常感谢这里的任何支持!

标签: sqlgoogle-bigquerybigquery-standard-sql

解决方案


不幸的是,我无法通过 Union all、Cross Join 或 left join 找到解决方案......

另一种选择是使用 FULL JOIN,如下例所示

#standardSQL
SELECT
  COALESCE(five_star.page, four_star.page, three_star.page, two_star.page, one_star.page) AS page,
  IFNULL(five_star.five_star_rating, 0) AS five_star,
  IFNULL(four_star.four_star_rating, 0) AS four_star,
  IFNULL(three_star.three_star_rating, 0) AS three_star,
  IFNULL(two_star.two_star_rating, 0) AS two_star,
  IFNULL(one_star.one_star_rating, 0) AS one_star
FROM `project.dataset.table5` five_star
FULL JOIN `project.dataset.table4` four_star USING (page)
FULL JOIN `project.dataset.table3` three_star USING (page)
FULL JOIN `project.dataset.table2` two_star USING (page)
FULL JOIN `project.dataset.table1` one_star USING (page)

您可以使用您问题中的虚拟数据进行测试,如下所示

#standardSQL
WITH `project.dataset.table5` AS (
  SELECT 'A' page, 1 five_star_rating UNION ALL
  SELECT 'B', 1 UNION ALL
  SELECT 'C', 1 
), `project.dataset.table4` AS (
  SELECT 'C' page, 1 four_star_rating UNION ALL
  SELECT 'D', 1 UNION ALL
  SELECT 'F', 1 
), `project.dataset.table3` AS (
  SELECT 'F' page, 1 three_star_rating UNION ALL
  SELECT 'G', 1 UNION ALL
  SELECT 'H', 1 
), `project.dataset.table2` AS (
  SELECT 'H' page, 1 two_star_rating UNION ALL
  SELECT 'I', 1 UNION ALL
  SELECT 'J', 1 
), `project.dataset.table1` AS (
  SELECT 'J' page, 1 one_star_rating UNION ALL
  SELECT 'K', 1 UNION ALL
  SELECT 'L', 1 
)
SELECT
  COALESCE(five_star.page, four_star.page, three_star.page, two_star.page, one_star.page) AS page,
  IFNULL(five_star.five_star_rating, 0) AS five_star,
  IFNULL(four_star.four_star_rating, 0) AS four_star,
  IFNULL(three_star.three_star_rating, 0) AS three_star,
  IFNULL(two_star.two_star_rating, 0) AS two_star,
  IFNULL(one_star.one_star_rating, 0) AS one_star
FROM `project.dataset.table5` five_star
FULL JOIN `project.dataset.table4` four_star USING (page)
FULL JOIN `project.dataset.table3` three_star USING (page)
FULL JOIN `project.dataset.table2` two_star USING (page)
FULL JOIN `project.dataset.table1` one_star USING (page)   

结果符合预期:

Row page    five_star   four_star   three_star  two_star    one_star     
1   A       1           0           0           0           1    
2   B       1           0           0           0           1    
3   C       1           1           0           0           1    
4   D       0           1           0           0           0    
5   F       0           1           1           0           0    
6   G       0           0           1           0           0    
7   H       0           0           1           1           0    
8   I       0           0           0           1           0    
9   J       0           0           0           1           0    

推荐阅读