python - 为什么scrapy FormRequest无法登录?
问题描述
我正在尝试通过 scrapy.FormRequest登录https://ptab.uspto.gov/#/login 。下面是我的代码。在终端中运行时,scrapy 不输出该项目并说它爬取了 0 页。我的代码不允许登录有什么问题?
import scrapy
from ..items import PatentItem
from scrapy.utils.response import open_in_browser
class LoginNeedScraper(scrapy.Spider):
name = 'ptab'
start_urls = ('https://ptab.uspto.gov/#/login')
def parse(self, response):
return scrapy.FormRequest.from_response(response,
formdata={'userName':'username', 'password':'password'},
callback=self.logged_in)
def logged_in(self, response):
open_in_browser ( response )
item = PatentItem()
item['message'] = response.css('h1::text').extract()
return item
以下是终端中的输出:
(Scrape) (base) Andrews-MacBook-Pro-5:patent rhodes259$ scrapy crawl ptab -o data.json
2021-03-16 01:10:02 [scrapy.utils.log] INFO: Scrapy 2.4.1 started (bot: patent)
2021-03-16 01:10:02 [scrapy.utils.log] INFO: Versions: lxml 4.6.2.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 21.2.0, Python 3.6.3 (v3.6.3:2c5fed86e0, Oct 3 2017, 00:32:08) - [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)], pyOpenSSL 20.0.1 (OpenSSL 1.1.1j 16 Feb 2021), cryptography 3.4.6, Platform Darwin-19.6.0-x86_64-i386-64bit
2021-03-16 01:10:02 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2021-03-16 01:10:02 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'patent',
'NEWSPIDER_MODULE': 'patent.spiders',
'ROBOTSTXT_OBEY': True,
'SPIDER_MODULES': ['patent.spiders']}
2021-03-16 01:10:02 [scrapy.extensions.telnet] INFO: Telnet Password: 93dadadb5f6c58a8
2021-03-16 01:10:02 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2021-03-16 01:10:02 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2021-03-16 01:10:02 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2021-03-16 01:10:02 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2021-03-16 01:10:02 [scrapy.core.engine] INFO: Spider opened
2021-03-16 01:10:02 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-03-16 01:10:02 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2021-03-16 01:10:02 [scrapy.core.engine] INFO: Closing spider (finished)
2021-03-16 01:10:02 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'elapsed_time_seconds': 0.006319,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2021, 3, 16, 5, 10, 2, 926018),
'log_count/INFO': 10,
'memusage/max': 60981248,
'memusage/startup': 60981248,
'start_time': datetime.datetime(2021, 3, 16, 5, 10, 2, 919699)}
2021-03-16 01:10:02 [scrapy.core.engine] INFO: Spider closed (finished)
解决方案
单击登录时的 POST 请求将发送到https://ptab.uspto.gov/ptabe2e/rest/login
推荐阅读
- assembly - 指数函数汇编语言
- css - 页面加载动画在鼠标离开时第二次执行
- javascript - Express-Server 从 MYSQL-database 接收到错误的数据
- c# - 隐藏 windows 窗体后无法显示它
- vue.js - 如何在 vue js 中监听 eventBus 值
- java - Redis SUBSCRIBE 命令停止应用程序
- oauth-2.0 - ROPC - 出现“invalid_grant”错误,描述为“AADSTS50126:用户名或密码无效”
- c# - 为什么这个字段无效?
- ios - 如何从swift4中的json数组中获取值
- mysql - 更新列以值 'st'% 开头的表