python - 为什么scrapy返回错误说“'LoginSpider'对象没有属性'logged_in'”?
问题描述
我正在尝试使用scrapy从我必须先登录的网站上抓取数据。出于这个原因,我一直在尝试使用此处概述的 LoginSpider:https ://docs.scrapy.org/en/latest/topics/request-response.html#topics-request-response-ref-request-userlogin
我的脚本与上面链接中给出的示例非常相似。但是,最近,当我尝试从命令行运行脚本时,我收到一条很长的错误消息,其中最后一行是:“builtins.AttributeError:'LoginSpider'对象没有属性'logged_in'”我环顾四周其他论坛以获取此问题的其他答案,但找不到任何似乎涉及此特定问题的内容。
接下来是我的整个蜘蛛。
import scrapy
from scrapy.http import FormRequest
from scrapy.spiders import Spider
def authentication_failed(response):
# TODO: Check the contents of the response and return True if it failed
# or False if it succeeded.
pass
class LoginSpider(scrapy.Spider):
name = 'wine'
start_urls=['https://www.jancisrobinson.com/#login']
def parse(self, response):
return scrapy.FormRequest.from_response(
response,
formdata={'username': 'cathy@enolytics.com', 'password': 'purple'},
callback=self.after_login
)
def after_login(self, response):
if authentication_failed(response):
self.logger.error("Login failed")
return
else:
self.logger.error("Login succeeded!")
item = SampleItem()
item["quote"] = response.css(".text").extract()
item["author"] = response.css(".author").extract()
return item
def start_requests(self):
return [scrapy.FormRequest("https://www.jancisrobinson.com/#login",
formdata={'user': 'john', 'pass': 'secret'},
callback=self.logged_in)]
根据要求,这是我上次运行此蜘蛛时生成的完整命令行回溯:
Traceback (most recent call last):
File "/Users/jinkinsonsmith/Documents/Scripting/my_project/spiders/tutorial/tutorial/spiders/venv/lib/python3.6/site-packages/scrapy/crawler.py", line 192, in crawl
return self._crawl(crawler, *args, **kwargs)
File "/Users/jinkinsonsmith/Documents/Scripting/my_project/spiders/tutorial/tutorial/spiders/venv/lib/python3.6/site-packages/scrapy/crawler.py", line 196, in _crawl
d = crawler.crawl(*args, **kwargs)
File "/Users/jinkinsonsmith/Documents/Scripting/my_project/spiders/tutorial/tutorial/spiders/venv/lib/python3.6/site-packages/twisted/internet/defer.py", line 1613, in unwindGenerator
return _cancellableInlineCallbacks(gen)
File "/Users/jinkinsonsmith/Documents/Scripting/my_project/spiders/tutorial/tutorial/spiders/venv/lib/python3.6/site-packages/twisted/internet/defer.py", line 1529, in _cancellableInlineCallbacks
_inlineCallbacks(None, g, status)
--- <exception caught here> ---
File "/Users/jinkinsonsmith/Documents/Scripting/my_project/spiders/tutorial/tutorial/spiders/venv/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/Users/jinkinsonsmith/Documents/Scripting/my_project/spiders/tutorial/tutorial/spiders/venv/lib/python3.6/site-packages/scrapy/crawler.py", line 88, in crawl
start_requests = iter(self.spider.start_requests())
File "/Users/jinkinsonsmith/Documents/Scripting/my_project/spiders/tutorial/tutorial/spiders/myspider.py", line 24, in start_requests
callback=self.logged_in)]
builtins.AttributeError: 'LoginSpider' object has no attribute 'logged_in'
2020-06-12 10:02:24 [twisted] CRITICAL:
Traceback (most recent call last):
File "/Users/jinkinsonsmith/Documents/Scripting/my_project/spiders/tutorial/tutorial/spiders/venv/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
result = g.send(result)
File "/Users/jinkinsonsmith/Documents/Scripting/my_project/spiders/tutorial/tutorial/spiders/venv/lib/python3.6/site-packages/scrapy/crawler.py", line 88, in crawl
start_requests = iter(self.spider.start_requests())
File "/Users/jinkinsonsmith/Documents/Scripting/my_project/spiders/tutorial/tutorial/spiders/myspider.py", line 24, in start_requests
callback=self.logged_in)]
AttributeError: 'LoginSpider' object has no attribute 'logged_in'
解决方案
是因为你还没有self.logged_in
在你的蜘蛛类中定义方法。您在此处引用此方法:
return [scrapy.FormRequest("https://www.jancisrobinson.com/#login",
formdata={'user': 'john', 'pass': 'secret'},
callback=self.logged_in)
它的作用是,scrapy 发出这个请求后,该self.logged_in
方法就会被执行。您需要定义此方法:
class LoginSpider(scrapy.Spider):
name = 'wine'
start_urls=['https://www.jancisrobinson.com/#login']
def parse(self, response):
return scrapy.FormRequest.from_response(
response,
formdata={'username': 'cathy@enolytics.com', 'password': 'purple'},
callback=self.after_login)
def after_login(self, response):
if authentication_failed(response):
self.logger.error("Login failed")
return
else:
self.logger.error("Login succeeded!")
item = SampleItem()
item["quote"] = response.css(".text").extract()
item["author"] = response.css(".author").extract()
return item
def start_requests(self):
return [scrapy.FormRequest("https://www.jancisrobinson.com/#login",
formdata={'user': 'john', 'pass': 'secret'},
callback=self.logged_in)]
def logged_in(self,reponse):
# do something here
pass
或者 ...
您需要self.logged_in
通过现有方法之一进行更改。如果这有帮助,请告诉我,如果没有,请随时提出任何问题,我很乐意提供帮助。
推荐阅读
- c++ - 您可以从 c++ 中的转换函数使用的函数访问当前迭代器吗?
- laravel - resources/js/app.js 如何从 .env 文件中获取 APP_DEBUG?
- python-3.x - 当每个进程需要多个线程时优化python3多处理
- embed - 如何在网页上嵌入显示插入的任务窗格加载项的 Excel 工作簿?
- node.js - Node Express:如何缓存`sendFile`?
- typescript - 我们如何生成 Uniswap 子图构建文件夹?
- flutter - 从 JsLinkedHashMap 获取所有值
飘飘然 - azure - 如何使用 Pester 测试框架编写一个小测试函数
- python - PyCharm 中的 Azure 函数本地设置并发布到 Azure
- python-3.x - 如果 telnet 正在接收数据,则为布尔值