首页 > 解决方案 > how to expire partitions of existing partitioned tables in BigQuery

问题描述

I need to create a copy of the production dataset in BigQuery to the testing environment and use it to simulate the pipeline processing with new changes.

However, the production dataset is huge. so I usually want to only keep its most recent data for testing.

To do that, I would like to truncate all partitioned data that is older than 30 days in my dataset.

I tried setting partition expiration at the dataset level. it doesn't work.

So how could I do that.

标签: google-bigquery

解决方案


我对此做了一些测试并确认了这一点。

在数据集级别设置默认分区到期时。它仅适用于新表。对于现有的分区表,您需要在单个表级别设置分区以使其分区过期。例如:

 ALTER TABLE `gcp_A.dataset_1.measurements`
 SET OPTIONS (
   -- Sets partition expiration to 30 days
   partition_expiration_days=30
 ); 

select min(stamp) from `gcp_A.dataset_1.measurements`
-- [result] 
-- 2021-06-15 00:00:00 UTC

推荐阅读