首页 > 解决方案 > BigQuery 的 DBT 无效快照时间错误

问题描述

我在 BigQuery 中保存了数据,我对表运行 DBT 并在 BigQuery 中填充新表。我dbt run每 5 分钟通过cron. 我也dbt test每小时运行一次,以检查数据一致性。

dbt run总是成功运行,我从来没有遇到过任何问题。但是,有时在运行时dbt test我偶尔会看到(可能每 20 次运行一次),出现以下错误:

Invalid snapshot time 1595674825279 for table xxx:dbt_stg.stg_journey_detail. Cannot read before 1595674825283

失败的测试始终是测试特定表上非空主键的测试。该表是一个中间表,如下所示:

with dedupe as (
    select * EXCEPT(row_num) from (
            select * from (
              select *, row_number() over (
                  partition by id
                  order by updated_at desc
              ) as row_num
              from {{ source('master', 'journey_detail') }}
            )
          ) as journey_detail_dedupe
        where journey_detail_dedupe.row_num = 1
)

select
    * except (id, pausable),
    id as journey_detail_id,
    pausable as is_pausable

from dedupe

测试看起来像这样:

  - name: stg_journey_detail
    description: This model cleans up the journey detail table
    columns:
      - name: journey_detail_id
        description: Primary key
        tests:
          - unique
          - not_null

看起来 BigQuery 从来没有真正运行过查询,所以我怀疑这个错误是由 DBT 产生的。如果我在 BigQuery 日志中查看测试名称,我可以看到实际运行此查询的其他时间,这就是查询:

/* {"app": "dbt", "dbt_version": "0.17.0", "profile_name": "beryl", "target_name": "default", "node_id": "test.beryl.not_null_stg_journey_detail_journey_detail_id"} */

select count(*) as validation_errors
from `beryl-basis-development`.`dbt_stg`.`stg_journey_detail`
where journey_detail_id is null

非常感谢任何帮助!

标签: google-bigquerydbt

解决方案


推荐阅读