ruby-on-rails - Best approach to open and parse a file from an FTP (Rails, Sidekiq, SFTP)?
问题描述
We have a shared FTP with our business partner. The partner sends there text files containing information about orders.
On our side, we run every hour (CRON) a rake task that will connect to the FTP (we're using SFTP), then in a loop we list all the files, open one after another one, extract the needed information from there and save it to our database. Then, we delete the file from the FTP.
For an average file, this operation lasts about 20s. If I imagine that we would need to process 20 files, that's almost 7 minutes and my worry is the app would crash.
Here's a pseudo-code we use:
namespace :check_ftp do desc "Check every 30 minutes"
task :fetch_orders => [:environment] do
check_dir = '/dir'
Net::SFTP.start(host, username, password: pass) do |sftp|
sftp.dir.entries(check_dir).each do |remote_file|
file_data = sftp.download!(file_path) # loading the file to the buffer
... here goes all the parsing ...
... here save to the DB ...
... log information about this action ...
... delete the file from FTP ...
end
end
end
end
What's the best approach to process all the files from the FTP with minimal crashes (ideally none) if the rake task would run ~20mins?
I was thinking that I would run the whole Net::SFTP.start(...
block with Sidekiq. Then, I was also thinking that I would only run these actions
... here goes all the parsing ...
... here save to the DB ...
... log information about this action ...
... delete the file from FTP ...
in a Sidekiq job (not the whole Net::SFTP.start...
block).
What is the best approach here to solve this situation?
Thank you in advance.
解决方案
我通常会尝试将大型后台作业分解为它们的原子部分,这些部分最终会从主要的重复作业中分离出来。
在您的情况下,这看起来像这样:将您的 rake 任务移动到 Sidekiq 作业中,并使用诸如sidekiq-cron之类的东西来安排它。该作业将负责遍历 FTP 服务器上的文件,下载它们,然后将作业排入队列以解析该文件(您将向作业传递文件的路径)。在该工作中,您将解析文件,保存需要保存的任何内容,并最终删除文件。
该架构利用您的各种 Sidekiq 工作人员,允许他们同时解析文件。此外,如果解析一个文件失败,它不会阻止解析其余文件。您可以依靠 Sidekiq 的内置重试逻辑来重试解析该文件,直到它成功(由于外部因素或错误修复)。您还可以完成更快的工作,每个工作都专注于一项非常具体的任务(“查找所有需要解析的文件”、“解析这个特定文件”)。
推荐阅读
- ruby - 不使用 Enumerable 类或哈希或映射的 Ruby 置换方法代码
- c# - HttpPost 在 ASP.NET MVC 中返回空模型
- angularjs - 错误:[$injector:unpr] 未知提供者:rProvider <- r <- notificationsBarDirective
- python - SQLAlchemy (sqlite3.OperationalError) 没有这样的表:
- python - 如何仅暂停 IF 语句,而不是整个代码 Python
- apache-kafka - Kafka 拓扑设计:如何在超时时加入滑动窗口并发出事件?[难的]
- ruby - bundle install - Gemfile 没有指定依赖项 - 即使我在包含 Gemfile 的目录中
- python - 使方法使用递归选择文件的位置
- python - Python如何处理控制台程序中的图像?
- python - 从蒙版图像区域获取值 OpenCV Python