首页 > 解决方案 > 优化查询,检查 user_id 是否存在于许多表中

问题描述

我的查询优化有问题(bigquery 向我显示它将处理 26 GB 的数据,我不想消耗所有这些数据......)。我想检查 user_id 是否存在于 4 个不同的表中,如果它会在其中一个表中找到它,则返回平台的名称。即使不使用所有这些情况,有没有办法优化这个逻辑?

CASE
   WHEN a.user_id in (SELECT safe_cast(USER_ID as int64) FROM `platform1_table`
   where front_date >= "2016-01-01") THEN "platform1"
   ELSE NULL
END AS platform1,
CASE
   WHEN a.user_id in (SELECT safe_cast(USER_ID as int64) FROM `platform2_table`
   where front_date >= "2016-01-01") THEN "platform2"
   ELSE NULL
END AS platform2,
CASE
   WHEN a.user_id in (SELECT safe_cast(USER_ID as int64) FROM `platform3_table`
   WHERE date(snapshot_time)>= "2016-01-01") THEN "platform3"
   ELSE NULL
END AS platform3,
CASE
   WHEN a.user_id in (SELECT safe_cast(USER_ID as int64) FROM `platform4_table`) THEN "platform4"
   ELSE NULL
END AS platform4

标签: sqlgoogle-bigquery

解决方案


你可以试试加盟条件

SELECT 
CASE
   WHEN b.USER_ID IS NOT NULL THEN "platform1"
   ELSE NULL
END AS platform1,
CASE
   WHEN c.USER_ID IS NOT NULL THEN "platform2"
   ELSE NULL
END AS platform2,
CASE
   WHEN d.USER_ID IS NOT NULL THEN "platform4"
   ELSE NULL
END AS platform3,
CASE
   WHEN e.USER_ID IS NOT NULL THEN "platform4"
   ELSE NULL
END AS platform4
FROM MAIN_TABLE a 
LEFT JOIN `platform1_table` b on a.user_id = safe_cast(b.USER_ID as int64)
LEFT JOIN `platform2_table`c ON a.user_id = safe_cast(c.USER_ID as int64)
LEFT JOIN `platform3_table`d ON a.user_id = safe_cast(d.USER_ID as int64)
LEFT JOIN `platform4_table`e ON a.user_id = safe_cast(e.USER_ID as int64)
WHERE a.front_date >= "2016-01-01"
and b.front_date >= "2016-01-01"
and date(c.snapshot_time)>= "2016-01-01"

尽管此解决方案可以通过在联接和过滤字段上实施分区和集群来最小化处理的数据。

这个解决方案会更快,因为我们正在加入数据IN使用现金。


推荐阅读