首页 > 解决方案 > BigQuery LEFT JOIN 一个表并根据条件过滤其数组元素

问题描述

我想将一个表连接到另一个包含数组的表,并且在连接结果中我只想拥有通过条件的数组元素。在这种情况下,日期条件。下面的代码片段说明了我的问题。我希望输出仅ids包含record_dates小于 '2019-10-15'

WITH platform AS (
        SELECT 'u1' AS id, 'm1' AS platform_id, '2019-10-12' as record_date
        UNION ALL
        SELECT 'u2' AS id, 'm1' AS platform_id, '2019-10-13' as record_date
        UNION ALL
        SELECT 'u21' AS id, 'm1' AS platform_id, '2019-10-16' as record_date    
), 

platform_agg AS (
        SELECT platform_id
              , ARRAY_AGG(id) as ids
              , ARRAY_AGG(record_date) as record_dates
        FROM platform
        GROUP BY platform_id
),


orders AS(
        SELECT 'u2' AS id, 'c1' AS order_id, '2019-10-15' as order_date 
), 


orders_plus_platform AS ( 
SELECT order_id
      , orders.id 
      , orders.order_date
      , platform.platform_id 
      , CASE WHEN platform.platform_id IS NOT NULL THEN platform_agg.ids ELSE [orders.id] END AS ids
      , CASE WHEN platform.platform_id IS NOT NULL THEN platform_agg.record_dates ELSE NULL END AS record_dates
FROM orders
    LEFT JOIN platform
        ON orders.id = platform.id and platform.record_date <= orders.order_date
    LEFT JOIN platform_agg
        ON platform.platform_id = platform_agg.platform_id 
)

SELECT * FROM orders_plus_platform

以下是当前查询输出,但是,在所需的输出中,u21应过滤掉元素,因为 record_date 在“2019-10-15”之后。

在此处输入图像描述

谢谢,

标签: arraysgoogle-bigquery

解决方案


以下解决方案对我有用。基本上,您将两次加入平台表以获取与平台关联的所有 id,而不是加入平台的预聚合版本。这样,您可以更轻松地应用过滤器。

orders_plus_platform AS ( 
SELECT order_id
      , orders.id 
      , orders.order_date
      , platform.platform_id 
      , ARRAY_AGG(CASE WHEN platform.platform_id IS NOT NULL THEN platform2.id ELSE orders.id END) AS ids
      , ARRAY_AGG(CASE WHEN platform.platform_id IS NOT NULL THEN platform2.record_date ELSE NULL END) AS record_dates
FROM orders        
    LEFT JOIN platform
        ON orders.id = platform.id and platform.record_date <= orders.order_date
    LEFT JOIN platform platform2
       ON platform.platform_id = platform2.platform_id AND platform2.record_date <= orders.order_date 
 GROUP BY     
      order_id  
      , orders.id 
      , orders.order_date
      , platform.platform_id 
)

推荐阅读