首页 > 解决方案 > 用大型 ruby​​/json 文件填充 db 的最佳方法?

问题描述

假设我有一个由散列组成的 ruby​​ 或 json 文件,范围为 14-20MB(未缩小 300K 行)。我创建了一个 rake 任务,它遍历每个散列并根据每个散列中的值创建一个 AR 对象。

不幸的是,由于文件的大小,我stack level too deep每次运行任务时都会出错。我实际上让脚本运行的唯一方法是将文件拆分为较小的文件。虽然这可行,但拆分文件并一遍又一遍地重复任务变得非常乏味。加载/运行大文件有什么好的选择吗?

耙任务

namespace :db do
  task populate: :environment do
    $restaurants.each_with_index do |r, index|
      uri = URI(r[:website])

      restaurant = Restaurant.find_or_create_by(name: r[:name], website: "#{uri.scheme}://#{uri.host}")

      restaurant.cuisines = r[:cuisines].map { |c| Cuisine.find_or_create_by(name: c) }

      location = Location.create(
        restaurant: restaurant,
        city_id: 1,
        address: r[:address],
        latitude: r[:latitude],
        longitude: r[:longitude],
        phone_number: r[:phone_number]
      )

      r[:hours].each do |h|
        Hour.create(
          location: location,
          day: Date::DAYNAMES.index(h[:day]),
          opens: h[:opens],
          closes: h[:closes]
        )
      end

      menu_group = MenuGroup.create(
        restaurant: restaurant,
        locations: [location],
        address: r[:address]
      )

      r[:menus].each do |m|
        menu = Menu.create(
          menu_group: menu_group,
          position: m[:position],
          name: m[:name]
        )

        m[:sections].each do |s|
          section = Section.create(
            menu: menu,
            position: s[:position],
            name: s[:name]
          )

          s[:dishes].each do |d|
            tag = Tag.find_or_create_by(
              name: d[:name].downcase.strip
            )

            Dish.find_or_create_by(
              restaurant: restaurant,
              sections: [section],
              tags: [tag],
              name: d[:name],
              description: d[:description]
            )
          end
        end
      end

      puts "#{index + 1} of #{$restaurants.size} completed"
    end
  end
end

错误

rake aborted!
SystemStackError: stack level too deep
/usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.3.1/lib/bootsnap/compile_cache/iseq.rb:12:in`to_binary'
/usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.3.1/lib/bootsnap/compile_cache/iseq.rb:12:in`input_to_storage'
/usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.3.1/lib/bootsnap/compile_cache/iseq.rb:37:in`fetch'
/usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.3.1/lib/bootsnap/compile_cache/iseq.rb:37:in`load_iseq'
/usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.3.1/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:21:in `require'
/usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.3.1/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:21:in `block in require_with_bootsnap_lfi'
/usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.3.1/lib/bootsnap/load_path_cache/loaded_features_index.rb:65:in `register'
/usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.3.1/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:20:in `require_with_bootsnap_lfi'
/usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.3.1/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:29:in `require'
/usr/local/lib/ruby/gems/2.5.0/gems/activesupport-5.2.0/lib/active_support/dependencies.rb:283:in `block in require'
/usr/local/lib/ruby/gems/2.5.0/gems/activesupport-5.2.0/lib/active_support/dependencies.rb:249:in `load_dependency'
/usr/local/lib/ruby/gems/2.5.0/gems/activesupport-5.2.0/lib/active_support/dependencies.rb:283:in `require'
/Users/user/app/lib/tasks/populate.rake:1:in `<main>'
/usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.3.1/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:50:in `load'
/usr/local/lib/ruby/gems/2.5.0/gems/bootsnap-1.3.1/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:50:in `load'
/usr/local/lib/ruby/gems/2.5.0/gems/activesupport-5.2.0/lib/active_support/dependencies.rb:277:in `block in load'
/usr/local/lib/ruby/gems/2.5.0/gems/activesupport-5.2.0/lib/active_support/dependencies.rb:249:in `load_dependency'
/usr/local/lib/ruby/gems/2.5.0/gems/activesupport-5.2.0/lib/active_support/dependencies.rb:277:in `load'
/usr/local/lib/ruby/gems/2.5.0/gems/railties-5.2.0/lib/rails/engine.rb:650:in `block in run_tasks_blocks'
/usr/local/lib/ruby/gems/2.5.0/gems/railties-5.2.0/lib/rails/engine.rb:650:in `each'
/usr/local/lib/ruby/gems/2.5.0/gems/railties-5.2.0/lib/rails/engine.rb:650:in `run_tasks_blocks'
/usr/local/lib/ruby/gems/2.5.0/gems/railties-5.2.0/lib/rails/application.rb:515:in `run_tasks_blocks'
/usr/local/lib/ruby/gems/2.5.0/gems/railties-5.2.0/lib/rails/engine.rb:459:in `load_tasks'
/Users/user/app/Rakefile:6:in `<top (required)>'
/usr/local/lib/ruby/gems/2.5.0/gems/rake-12.3.1/exe/rake:27:in `<top (required)>'
(See full trace by running task with --trace)

标签: ruby-on-railsrubyactiverecordrake

解决方案


我会使用Sidekiq之类的东西将工作分解为可以同时运行的工作人员。

例如:

$restaurants.each_with_index do |r, index|
    RestaurantParser.perform_async(r, index)
end

在 RestaurantParser 中执行您通常会采取的步骤。

只要餐厅不依赖数据库中已经存在的其他餐厅,您就可以同时运行工作人员以加快流程。


推荐阅读