首页 > 解决方案 > 根据列值联合两个具有不同日期的表?

问题描述

我们在 1 月 1 日切换到一个新平台,我需要合并两个表来获得一个包含旧数据和新数据组合的数据源。但是,一些帐户必须在 1 月 1 日之前从旧平台切换出来。

新数据表包含所有帐户 12 月的数据,但我只想在没有旧 12 月数据的情况下使用新的 12 月数据。如何将新数据与从 1 月 1 日开始的大多数帐户数据以及从 12 月的适当日期开始的少数异常帐户结合起来?

例如:对于 Account1,我需要从 1 月 1 日开始的新数据;对于 Account2,我需要 12 月 30 日的新数据;对于帐户 3,我需要 12 月 31 日的新数据

Old Table  
------------------------------------   
Account         Date         Sales  
------------------------------------
Account1        12-29-18     10  
Account1        12-30-18     10  
Account1        12-31-18     5  
Account2        12-29-18     10    
Account3        12-29-18     20  
Account3        12-30-18     10

New Table
------------------------------------   
Account         Date         Sales  
------------------------------------
Account1        12-29-18     10  
Account1        12-30-18     10  
Account1        12-31-18     5  
Account1        01-01-19     20  
Account2        12-30-18     15  
Account2        12-31-18     20  
Account2        01-01-19     10  
Account3        12-30-18     10  
Account3        12-31-18     20  
Account3        01-01-19     5  

Output
------------------------------------   
Account         Date         Sales  
------------------------------------
Account1        12-29-18     10  
Account1        12-30-18     10  
Account1        12-31-18     5  
Account1        01-01-19     20  
Account2        12-29-18     10
Account2        12-30-18     15  
Account2        12-31-18     20  
Account2        01-01-19     10
Account3        12-29-18     20  
Account3        12-30-18     10
Account3        12-31-18     20  
Account3        01-01-19     5  

标签: sqlgoogle-bigqueryunion

解决方案


以下是 BigQuery 标准 SQL

  #standardSQL
  SELECT account, date, 
    ARRAY_AGG(sales ORDER BY data LIMIT 1)[OFFSET(0)] sales
  FROM (
    SELECT 'old' data, * FROM `project.dataset.old_table` UNION ALL 
    SELECT 'new' data, * FROM `project.dataset.new_table` 
  )
  GROUP BY account, date

您可以使用您问题中的示例数据进行测试,使用上述方法

  #standardSQL
  WITH `project.dataset.old_table` AS (
    SELECT 'Account1' account, '12-29-18' date, 10 sales UNION ALL  
    SELECT 'Account1', '12-30-18', 10 UNION ALL  
    SELECT 'Account1', '12-31-18', 5 UNION ALL  
    SELECT 'Account2', '12-29-18', 10 UNION ALL    
    SELECT 'Account3', '12-29-18', 20 UNION ALL  
    SELECT 'Account3', '12-30-18', 10 
  ),  `project.dataset.new_table` AS (
    SELECT 'Account1' account, '12-29-18' date, 10 sales UNION ALL
    SELECT 'Account1', '12-30-18', 10 UNION ALL
    SELECT 'Account1', '12-31-18', 5 UNION ALL
    SELECT 'Account1', '01-01-19', 20 UNION ALL
    SELECT 'Account2', '12-30-18', 15 UNION ALL
    SELECT 'Account2', '12-31-18', 20 UNION ALL
    SELECT 'Account2', '01-01-19', 10 UNION ALL
    SELECT 'Account3', '12-30-18', 10 UNION ALL
    SELECT 'Account3', '12-31-18', 20 UNION ALL
    SELECT 'Account3', '01-01-19', 5 
  )
  SELECT account, date, 
    ARRAY_AGG(sales ORDER BY data LIMIT 1)[OFFSET(0)] sales
  FROM (
    SELECT 'old' data, * FROM `project.dataset.old_table` UNION ALL 
    SELECT 'new' data, * FROM `project.dataset.new_table` 
  )
  GROUP BY account, date
  ORDER BY account, PARSE_DATE('%m-%d-%y', date) 

结果

Row account     date        sales    
1   Account1    12-29-18    10   
2   Account1    12-30-18    10   
3   Account1    12-31-18    5    
4   Account1    01-01-19    20   
5   Account2    12-29-18    10   
6   Account2    12-30-18    15   
7   Account2    12-31-18    20   
8   Account2    01-01-19    10   
9   Account3    12-29-18    20   
10  Account3    12-30-18    10   
11  Account3    12-31-18    20   
12  Account3    01-01-19    5    

推荐阅读