首页 > 解决方案 > 从两个单独的表中递归减法以填充历史数据

问题描述

我在 Snowflake 中托管了两个数据集,每天都有社交​​媒体关注者数量。我们将使用的主表 (follower_counts) 按天显示关注者计数:

追随者计数

该表格于 2020 年 4 月 4 日生效,并将每天更新。不幸的是,我无法获得这种格式的历史数据。相反,我有一个包含历史数据 (follower_gains) 的表格,其中显示了几个帐户的每日净追随者收益:

追随者增益

理想情况下-我想从当前表中的最小日期(follower_counts)中获取follower_count值并减去每天的收益总和(有机+付费收益),直到follower_gains表的最小日期,以填写follower_count历史上。此外,这些表中有多个帐户的数据,因此需要按帐户分组。它应该如下所示:

理想表

我只是将这两个表联合在一起,但甚至不知道从哪里开始循环遍历这些行:

WITH a AS (
SELECT
  account_id,
  date,
  organizational_entity,
  organizational_entity_type,
  vanity_name,
  localized_name,
  localized_website,
  organization_type,
  total_followers_count,
  null AS paid_follower_gain,
  null AS organic_follower_gain,
  account_name,
  last_update
FROM follower_counts
  UNION ALL
  SELECT
account_id,
  date,
  organizational_entity,
  organizational_entity_type,
  vanity_name,
  localized_name,
  localized_website,
  organization_type,
  null AS total_followers_count,
  organic_follower_gain,
  paid_follower_gain,
  account_name,
  last_update
  FROM follower_gains)
SELECT
a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
a.total_followers_count,
a.organic_follower_gain,
a.paid_follower_gain,
a.account_name,
a.last_update
FROM a 
ORDER BY date desc LIMIT 100

标签: sqlsnowflake-cloud-data-platform

解决方案


更新:更改unionunion all添加not exists以删除重复项。根据评论进行了更改。

注意:请确保您不要发布表格的图像。很难重新创建您的场景以编写正确的查询。测试此解决方案并更新,以便我可以在必要时进行修改。

您不会loop使用 SQL,因为它不是一种过程语言。您在查询中定义的操作将针对表中的所有行执行。

with cte as (SELECT a.account_id,
              a.date,
              a.organizational_entity,
              a.organizational_entity_type,
              a.vanity_name,
              a.localized_name,
              a.localized_website,
              a.organization_type,
              (a.follower_count - (b.organic_gain+b.paid_gain)) AS follower_count,
              a.account_name,
              a.last_update,
              b.organic_gain,
              b.paid_gain
       FROM follower_counts a
       JOIN follower_gains b ON a.account_id = b.account_id
       AND b.date < (select min(date) from 
       follower_counts c where a.account.id = c.account_id) 
       )
    SELECT b.account_id,
          b.date,
          b.organizational_entity,
          b.organizational_entity_type,
          b.vanity_name,
          b.localized_name,
          b.localized_website,
          b.organization_type,
          b.follower_count,
          b.account_name,
          b.last_update,
          b.organic_gain,
          b.paid_gain
    FROM cte b
    UNION ALL
     SELECT a.account_id,
           a.date,
           a.organizational_entity,
           a.organizational_entity_type,
           a.vanity_name,
           a.localized_name,
           a.localized_website,
           a.organization_type,
           a.follower_count,
           a.account_name,
           a.last_update,
           NULL as organic_gain,
           NULL as paid_gain
    FROM follower_counts a where not exists (select 1 from 
    follower_gains c where a.account_id = c.account_id AND a.date = c.date)

推荐阅读