首页 > 解决方案 > 为什么scrapy返回错误说“'LoginSpider'对象没有属性'logged_in'”?

问题描述

我正在尝试使用scrapy从我必须先登录的网站上抓取数据。出于这个原因,我一直在尝试使用此处概述的 LoginSpider:https ://docs.scrapy.org/en/latest/topics/request-response.html#topics-request-response-ref-request-userlogin

我的脚本与上面链接中给出的示例非常相似。但是,最近,当我尝试从命令行运行脚本时,我收到一条很长的错误消息,其中最后一行是:“builtins.AttributeError:'LoginSpider'对象没有属性'logged_in'”我环顾四周其他论坛以获取此问题的其他答案,但找不到任何似乎涉及此特定问题的内容。

接下来是我的整个蜘蛛。

import scrapy
from scrapy.http import FormRequest
from scrapy.spiders import Spider
def authentication_failed(response):
# TODO: Check the contents of the response and return True if it failed
# or False if it succeeded.
    pass
class LoginSpider(scrapy.Spider):
    name = 'wine'
    start_urls=['https://www.jancisrobinson.com/#login']

    def parse(self, response):
        return scrapy.FormRequest.from_response(
            response,
            formdata={'username': 'cathy@enolytics.com', 'password': 'purple'},
            callback=self.after_login
        )
    def after_login(self, response):
        if authentication_failed(response):
            self.logger.error("Login failed")
            return
        else:
            self.logger.error("Login succeeded!")
            item = SampleItem()
            item["quote"] = response.css(".text").extract()
            item["author"] = response.css(".author").extract()
            return item
    def start_requests(self):
        return [scrapy.FormRequest("https://www.jancisrobinson.com/#login",
                                   formdata={'user': 'john', 'pass': 'secret'},
                                   callback=self.logged_in)]

根据要求,这是我上次运行此蜘蛛时生成的完整命令行回溯:

    Traceback (most recent call last):
  File "/Users/jinkinsonsmith/Documents/Scripting/my_project/spiders/tutorial/tutorial/spiders/venv/lib/python3.6/site-packages/scrapy/crawler.py", line 192, in crawl
    return self._crawl(crawler, *args, **kwargs)
  File "/Users/jinkinsonsmith/Documents/Scripting/my_project/spiders/tutorial/tutorial/spiders/venv/lib/python3.6/site-packages/scrapy/crawler.py", line 196, in _crawl
    d = crawler.crawl(*args, **kwargs)
  File "/Users/jinkinsonsmith/Documents/Scripting/my_project/spiders/tutorial/tutorial/spiders/venv/lib/python3.6/site-packages/twisted/internet/defer.py", line 1613, in unwindGenerator
    return _cancellableInlineCallbacks(gen)
  File "/Users/jinkinsonsmith/Documents/Scripting/my_project/spiders/tutorial/tutorial/spiders/venv/lib/python3.6/site-packages/twisted/internet/defer.py", line 1529, in _cancellableInlineCallbacks
    _inlineCallbacks(None, g, status)
--- <exception caught here> ---
  File "/Users/jinkinsonsmith/Documents/Scripting/my_project/spiders/tutorial/tutorial/spiders/venv/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/Users/jinkinsonsmith/Documents/Scripting/my_project/spiders/tutorial/tutorial/spiders/venv/lib/python3.6/site-packages/scrapy/crawler.py", line 88, in crawl
    start_requests = iter(self.spider.start_requests())
  File "/Users/jinkinsonsmith/Documents/Scripting/my_project/spiders/tutorial/tutorial/spiders/myspider.py", line 24, in start_requests
    callback=self.logged_in)]
builtins.AttributeError: 'LoginSpider' object has no attribute 'logged_in'

2020-06-12 10:02:24 [twisted] CRITICAL: 
Traceback (most recent call last):
  File "/Users/jinkinsonsmith/Documents/Scripting/my_project/spiders/tutorial/tutorial/spiders/venv/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/Users/jinkinsonsmith/Documents/Scripting/my_project/spiders/tutorial/tutorial/spiders/venv/lib/python3.6/site-packages/scrapy/crawler.py", line 88, in crawl
    start_requests = iter(self.spider.start_requests())
  File "/Users/jinkinsonsmith/Documents/Scripting/my_project/spiders/tutorial/tutorial/spiders/myspider.py", line 24, in start_requests
    callback=self.logged_in)]
AttributeError: 'LoginSpider' object has no attribute 'logged_in'

标签: pythonscrapy

解决方案


是因为你还没有self.logged_in在你的蜘蛛类中定义方法。您在此处引用此方法:

return [scrapy.FormRequest("https://www.jancisrobinson.com/#login",
                                   formdata={'user': 'john', 'pass': 'secret'},
                                   callback=self.logged_in)

它的作用是,scrapy 发出这个请求后,该self.logged_in方法就会被执行。您需要定义此方法:

class LoginSpider(scrapy.Spider):
    name = 'wine'
    start_urls=['https://www.jancisrobinson.com/#login']

    def parse(self, response):
        return scrapy.FormRequest.from_response(
            response,
            formdata={'username': 'cathy@enolytics.com', 'password': 'purple'},
            callback=self.after_login)

    def after_login(self, response):
        if authentication_failed(response):
            self.logger.error("Login failed")
            return
        else:
            self.logger.error("Login succeeded!")
            item = SampleItem()
            item["quote"] = response.css(".text").extract()
            item["author"] = response.css(".author").extract()
            return item

    def start_requests(self):
        return [scrapy.FormRequest("https://www.jancisrobinson.com/#login",
                                   formdata={'user': 'john', 'pass': 'secret'},
                                   callback=self.logged_in)]

    def logged_in(self,reponse):
        # do something here
        pass

或者 ...

您需要self.logged_in通过现有方法之一进行更改。如果这有帮助,请告诉我,如果没有,请随时提出任何问题,我很乐意提供帮助。


推荐阅读