首页 > 解决方案 > Best approach to open and parse a file from an FTP (Rails, Sidekiq, SFTP)?

问题描述

We have a shared FTP with our business partner. The partner sends there text files containing information about orders.

On our side, we run every hour (CRON) a rake task that will connect to the FTP (we're using SFTP), then in a loop we list all the files, open one after another one, extract the needed information from there and save it to our database. Then, we delete the file from the FTP.

For an average file, this operation lasts about 20s. If I imagine that we would need to process 20 files, that's almost 7 minutes and my worry is the app would crash.

Here's a pseudo-code we use:

namespace :check_ftp do  desc "Check every 30 minutes"
  task :fetch_orders => [:environment] do
    check_dir = '/dir'
    Net::SFTP.start(host, username, password: pass) do |sftp|
      sftp.dir.entries(check_dir).each do |remote_file|
        file_data = sftp.download!(file_path) # loading the file to the buffer
        ... here goes all the parsing ...
        ... here save to the DB ...
        ... log information about this action ...
        ... delete the file from FTP ...
      end
    end
  end
end

What's the best approach to process all the files from the FTP with minimal crashes (ideally none) if the rake task would run ~20mins?

I was thinking that I would run the whole Net::SFTP.start(... block with Sidekiq. Then, I was also thinking that I would only run these actions

... here goes all the parsing ...
... here save to the DB ...
... log information about this action ...
... delete the file from FTP ...

in a Sidekiq job (not the whole Net::SFTP.start... block).

What is the best approach here to solve this situation?

Thank you in advance.

标签: ruby-on-railsrubysftpsidekiq

解决方案


我通常会尝试将大型后台作业分解为它们的原子部分,这些部分最终会从主要的重复作业中分离出来。

在您的情况下,这看起来像这样:将您的 rake 任务移动到 Sidekiq 作业中,并使用诸如sidekiq-cron之类的东西来安排它。该作业将负责遍历 FTP 服务器上的文件,下载它们,然后将作业排入队列以解析该文件(您将向作业传递文件的路径)。在该工作中,您将解析文件,保存需要保存的任何内容,并最终删除文件。

该架构利用您的各种 Sidekiq 工作人员,允许他们同时解析文件。此外,如果解析一个文件失败,它不会阻止解析其余文件。您可以依靠 Sidekiq 的内置重试逻辑来重试解析该文件,直到它成功(由于外部因素或错误修复)。您还可以完成更快的工作,每个工作都专注于一项非常具体的任务(“查找所有需要解析的文件”、“解析这个特定文件”)。


推荐阅读