首页 > 解决方案 > 我是否必须声明一个常量才能从抓取的文件中导入我的模块数据?

问题描述

不知道我做错了什么。希望将我的刮刀文件数据scraper.rb导入应用程序。

无法理解为什么我会收到此错误,或者为什么我必须声明一个名为SCRAPER错误建议的常量。

Puma 发现了这个错误:预期文件 /Users/jmwofford/Desktop/Dev/scratchpad/scratch2_PRIMARY/projects/rails_scraper/scraperProj/app/controllers/scraper.rb 来定义常量 Scraper,但没有(Zeitwerk::NameError)

下面给出的是我的代码

刮刀.rb

require 'net/http'
require 'uri'
require 'json'
require "awesome_print"
require 'nokogiri'
require 'httparty'
require 'mechanize'

module ScraperFinder
    def scrape_essential_data
        uri = URI.parse("https://buildout.com/plugins/4b4283d94258de190a1a5163c34c456f6b1294a2/inventory")
        request = Net::HTTP::Get.new(uri)
        request.content_type = "application/x-www-form-urlencoded; charset=UTF-8"
        request["Authority"] = "buildout.com"
        request["Accept"] = "application/json, text/javascript, */*; q=0.01"
        request["X-Newrelic-Id"] = "Vg4GU1RRGwIJUVJUAwY="
        request["Dnt"] = "1"
        request["X-Requested-With"] = "XMLHttpRequest"
        request["User-Agent"] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36"
        request["Origin"] = "https://buildout.com"
        request["Sec-Fetch-Site"] = "same-origin"
        request["Sec-Fetch-Mode"] = "cors"
        request["Sec-Fetch-Dest"] = "empty"
        request["Referer"] = "https://buildout.com/plugins/4b4283d94258de190a1a5163c34c456f6b1294a2/leasespaces.jll.com/inventory/?pluginId=0&iframe=true&embedded=true&cacheSearch=true&=undefined"
        request["Accept-Language"] = "en-US,en;q=0.9"
        
        req_options = {
            use_ssl: uri.scheme == "https",
        }
        
        response = Net::HTTP.start(uri.hostname, uri.port, req_options) do |http|
            http.request(request)
        end
        
        json = JSON.parse(response.body)
        
        props = json['inventory']
        
        props.each do |listing|
            
            property = {
                'id' => listing['id'],
                'name' => listing['name'],
                'address' => listing['address_one_line'],
                'description' => listing['id'],
                'property_type' => listing['property_sub_type_name'],
                'attr' => listing['index_attributes'],
                'latitude'=> listing['latitude'],
                'longitude' => listing['longitude'],
                'picture' => listing['photo_url'],
                'sizing' => listing['size_summary'],
                'link' => listing['show_link'],
                'brokerContacts' => listing['broker_contacts']
            }
            
            Property.create(
                name: property.name,
                address: property.address,
                description: property.description,
                property_type: property.property_type,
                lat: property.latitude,
                lon: property.longitude,
                pic: property.picture,
                size: property.sizing,
                link: property.link,
                brokerContact: property.brokerContacts
            )
            p "==========================================================================================="
            # pp property
        end
    end 
end    

用户控制器

require_relative ("./scraper.rb")
include ScraperFinder
class UsersController < ApplicationController
    def index
        @scraped = ScraperFinder.scrape_essential_data
    end
end

index.html.erb

<!DOCTYPE html>

<html>
    <head>
        <meta charset="utf-8">
        <meta http-equiv="X-UA-Compatible" content="IE=edge">
        <title></title>
        <meta name="description" content="">
        <meta name="viewport" content="width=device-width, initial-scale=1">
        <link rel="stylesheet" href="">
    </head>
    <body>
        
        <% @scraped.each do |s|%>
           <div class="prop_container"> <%= s %>  </div>   
        <%end%>
        
        <script src="" ></script>
    </body>
</html>

模式

create_table "properties", force: :cascade do |t|
    t.string "name"
    t.string "address"
    t.string "description"
    t.string "property_type"
    t.string "attr"
    t.string "lat"
    t.string "lon"
    t.string "pic"
    t.string "size"
    t.string "link"
    t.string "brokerContact"
    t.datetime "created_at", precision: 6, null: false
    t.datetime "updated_at", precision: 6, null: false
end

create_table "users", force: :cascade do |t|
    t.string "name"
    t.string "email"
    t.datetime "created_at", precision: 6, null: false
    t.datetime "updated_at", precision: 6, null: false
end

标签: ruby-on-railsruby

解决方案


Zeitwerk(Rails 6+ 中使用的自动加载器)假设您在与常量同名的文件中声明常量。scraper.rb因此预计会声明常量Scraper。Zeitwerk 与旧的自动加载器不同,它将在启动时遍历您的自动加载目录并索引所有文件,这就是即使您没有引用常量它也会抱怨的原因Scraper

您可以将 Zeitwerk 配置为忽略某些文件夹,但您确实应该使用该程序并将您的代码调整为自动加载器。首先重命名您的文件scraper_finder.rb,它不属于控制器目录,因为它不是控制器。将它放在app/libapp/clients或任何地方确实更合适。

这实际上只是冰山一角,因为这段代码很糟糕。你真正想要的是这样的:

# app/lib/scraper_finder.rb
require 'net/http'
# You don't need to require gems as they are required by bundler during startup

module ScraperFinder
  # You need to use self to make the method callable as `ScraperFinder.scrape_essential_data`
  def self.scrape_essential_data
    req_options = {
      use_ssl: uri.scheme == "https",
    }
    response = Net::HTTP.start(uri.hostname, uri.port, req_options) do |http|
      http.request(
        self.get("https://buildout.com/plugins/4b4283d94258de190a1a5163c34c456f6b1294a2/inventory")
      )
    end
    json = JSON.parse(response.body)
    json['inventory'].map do |listing|
      Property.create(extract_attributes(listing))
    end
  end 
  
  private

  def self.extract_attributes(listing)
    listing.slice('id', 'name').symbolize_keys.merge(
      description:    listing['id'],
      property_type:  listing['property_sub_type_name'],
      attr:           listing['index_attributes'],
      lat:            listing['latitude'],
      lon:            listing['longitude'],
      pic:            listing['photo_url'],
      size:           listing['size_summary'],
      link:           listing['show_link'],
      brokerContacts: listing['broker_contacts']
    )
  end
  
  def self.get(uri)
    Net::HTTP::Get.new(URI.parse(uri)).then do |req|
      req.content_type = "application/x-www-form-urlencoded; charset=UTF-8"
      req["Authority"] = "buildout.com"
      req["Accept"] = "application/json, text/javascript, */*; q=0.01"
      req["X-Newrelic-Id"] = "Vg4GU1RRGwIJUVJUAwY="
      req["Dnt"] = "1"
      req["X-Requested-With"] = "XMLHttpRequest"
      req["User-Agent"] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36"
      req["Origin"] = "https://buildout.com"
      req["Sec-Fetch-Site"] = "same-origin"
      req["Sec-Fetch-Mode"] = "cors"
      req["Sec-Fetch-Dest"] = "empty"
      req["Referer"] = "https://buildout.com/plugins/4b4283d94258de190a1a5163c34c456f6b1294a2/leasespaces.jll.com/inventory/?pluginId=0&iframe=true&embedded=true&cacheSearch=true&=undefined"
      req["Accept-Language"] = "en-US,en;q=0.9"
    end
  end
end
class UsersController < ApplicationController
  def index
    @scraped = ScraperFinder.scrape_essential_data
  end
end

Ruby 中的散列与 Javascript 中的对象或 Struct 不同,因此您的代码将引发 NoMethodError on-property.name访问散列属性使用方括号property['name']。但是正如你所看到的那样,所有的重复从一开始就没有保证,因为 Ruby 具有出色的哈希处理方法。

您的方法也被声明为实例方法,但您将其称为 ScraperFinder.scrape_essential_data. 要使其成为模块方法,您需要使用def self.scrape_essential_data.

一些快速重构还将这个笨重的怪物拆分为三个更容易阅读原因的独立方法。

如果它在 app 目录中,您不需要手动要求您自己的代码,这样做只会引入错误。使用自动装载机。

include ScraperFinder正在将模块中的所有方法复制到全局范围内,因为您在模块/类之外调用它!由于您的模块似乎只是具有单例方法(在模块本身上调用的方法),因此您无需在任何地方导入它。


推荐阅读