首页 > 解决方案 > 如何创建一次性 rake 任务以移动 Amazon S3 存储桶中的文件夹?

问题描述

我有一个 Amazon S3 存储桶,我也上传了许多文档,位于“随机文件夹”和“存档”两个文件夹中。我想将“随机文件夹”放在“存档”文件夹中,我已将 Settings.yml 中的路径从 更改path: ':document_folder_name/:filename'path: 'archive/:document_folder_name/:filename'

我想创建一个一次性 rake 任务,将现在“随机文件夹”中的所有文档拉到我的“存档文件夹”中,并按如下方式构建它。

 - module Courts   class SyncronizeBucketJob < ApplicationJob
       SRC_FOLDER = 'ocr_scanner'.freeze
       ARCHIVE_FOLDER = 'archive'.freeze
       DATUM_FOLDER = '13.08.2021'.freeze
   
       def perform
         s3_objects = pull_s3_files
         create_ocr_documents(s3_objects)
         delete_s3_objects(s3_objects)
       end
   
       private
   
       def pull_s3_files
         # read files from the folder 'ocr_scanner'
         s3_resource
           .bucket(bucket_name)
           .objects(prefix: SRC_FOLDER)
           .select { |obj| obj.key.ends_with?('.pdf') }
       end
   
       def create_ocr_documents(s3_objects)
         s3_objects.each do |obj|
           resp = s3_client.get_object(bucket: bucket_name, key: obj.key)
           file = resp.body
           OcrDocument.create(document: file)
         end
       end
   
       def remove_s3_objects(s3_objects)
         s3_objects.each(&:delete)
       end
   
       def delete_s3_objects(s3_objects)
         s3_objects.each do |obj|
           new_key = obj.key.sub(SRC_FOLDER, ARCHIVE_FOLDER)
           obj.put(metadata: { 'new_key' => 'ok' })
           obj.move_to(bucket: bucket_name, key: new_key)
         end
       end

我的 rake 文件如下

 namespace :courts do
  task update_ocr_documents: :environment do
    OcrDocument.find_each do |ocr_document|
      ::Courts::Operations::OcrDocuments::RecognizeExisting.run(id: ocr_document.id, skip_auth: true)
    end
  end

  task sync_ocr_bucket: :environment do
    ::Courts::SyncronizeBucketJob.perform_now
  end

  task move_ocr_documents: :environment do
    s3_objects.each do |obj|
      new_key = obj.key.sub(SRC_FOLDER, ARCHIVE_FOLDER)
      obj.put(metadata: { 'new_key' => 'ok' })
      obj.move_to(bucket: bucket_name, key: new_key)
    end
  end
end

当我尝试运行 rake 时,我收到此错误

rake aborted!
Don't know how to build task 'courts:move_ocr_documents' (See the list of available tasks with `rake --tasks`)
Did you mean?  courts:sync_documents
               courts:update_ocr_documents
/Users/co/.rvm/gems/ruby-2.5.1/gems/rake-13.0.3/exe/rake:27:in `<top (required)>'
/Users/co/.rvm/gems/ruby-2.5.1/bin/ruby_executable_hooks:22:in `eval'
/Users/co/.rvm/gems/ruby-2.5.1/bin/ruby_executable_hooks:22:in `<main>'
(See full trace by running task with --trace)

当我更改“删除 S3 对象”以匹配 rake 任务(根据需要?)时,我会收到此错误。

rake courts:delete_s3_objects --trace
** Invoke courts:delete_s3_objects (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute courts:delete_s3_objects
rake aborted!
NoMethodError: undefined method `delete_s3_objects' for Courts::SyncronizeBucketJob:Class

标签: ruby-on-railsrubyamazon-web-servicesamazon-s3

解决方案


推荐阅读