首页 > 解决方案 > 使用 Airflow 删除 S3 存储桶对象

问题描述

在 Airflow 中使用 S3DeleteObjectsOperator 不会删除指定 S3 存储桶中的对象,即使任务显示已成功删除

delete_s3bucket_files = S3DeleteObjectsOperator(
  task_id='delete_s3bucket_files',
  start_date=start_date,
  bucket='*********************',
  keys='*********************',
  aws_conn_id='aws_default',
)

所以任务运行显示密钥已删除,但它仍然存在于我的 S3 存储桶中。

[2019-09-16 11:39:25,775] {base_task_runner.py:101} INFO - Job 1346: Subtask delete_s3bucket_files [2019-09-16 11:39:25,775] {cli.py:517} INFO - Running <TaskInstance: daily_database_transfer.delete_s3bucket_files 2019-09-16T09:39:16.873030+00:00 [running]> on host Saurav-macbook.local
[2019-09-16 11:39:25,971] {s3_delete_objects_operator.py:83} INFO - Deleted: ['*********************']

我在这里遗漏了什么,或者有没有办法找出它为什么不删除对象?

标签: amazon-s3airflow

解决方案


您可以S3DeleteBucketOperatorforce_delete=True删除存储桶之前强制删除存储桶中的所有对象。

所以你可以这样做:

from airflow.providers.amazon.aws.operators.s3_bucket import S3DeleteBucketOperator

delete_s3bucket = S3DeleteBucketOperator(
  task_id='delete_s3bucket_task',
  force_delete=True,
  start_date=start_date,
  bucket_name='*********************',
  aws_conn_id='aws_default',
)

如果您更喜欢单独删除文件和删除存储桶,您可以执行以下操作:

from airflow.providers.amazon.aws.operators.s3_bucket import S3DeleteBucketOperator
from airflow.providers.amazon.aws.operators.s3_delete_objects import S3DeleteObjectsOperator

delete_s3bucket_files = S3DeleteObjectsOperator(
  task_id='delete_s3bucket_files',
  start_date=start_date,
  bucket='*********************',
  keys='*********************',
  aws_conn_id='aws_default',
)

delete_s3bucket = S3DeleteBucketOperator(
  task_id='delete_s3bucket_task',
  force_delete=False, #bucket will be deleted only if it's empty.
  start_date=start_date,
  bucket_name='*********************',
  aws_conn_id='aws_default',
)

delete_s3bucket_files >> delete_s3bucket

推荐阅读