python - How to manually throw a 503 error on scrapy?
问题描述
I am scraping Amazon, and I want to be able to throw a 503 error anytime I receive a captcha from the website. This would allow this webpage to be retried later. I can already detect if there is a captcha on the page, I just need to be able to throw the 503 error to retry it later. Below is the ideal way I would be able to accomplish my goal.
if response.css('#captchacharacters').extract()[0]:
# Insert code to throw an error
解决方案
尝试使用像下面这样的 Downloadermiddleware,
from scrapy.downloadermiddlewares.retry import RetryMiddleware
class TutorialDownloaderMiddleware(RetryMiddleware):
def process_response(self, request, response, spider):
# test for captcha page
if response.css('#captchacharacters').extract()[0]:
reason = 'capcha'
return self._retry(request, reason, spider) or response
return response
不要忘记在设置中添加它,
DOWNLOADER_MIDDLEWARES = {
'tutorial.middlewares.TutorialDownloaderMiddleware': 543,
}
它将重定向并重试。
推荐阅读
- c - 如何将数据加载到 FFmpeg - AVBuffer
- python - Elastalert 没有通过 Slack 通道发出警报
- javascript - 在 swagger-ui 中禁用展开/折叠
- javascript - 如何替换/更新字符串中的特定 HTML 标记
- python-3.x - 如何在Python中从星期一开始获取列日期的周数
- graphql - 盖茨比相关内容
- r - 如何创建函数并将结果添加为列?
- c - 使用 tiny-json(无 libcurl)在 C 中删除 HTTP 标头并解析 JSON 有效负载
- python - 我可以在 Pandas 中缓存/恢复滚动窗口操作吗?
- laravel - Laravel 应用程序无法在 Apache 网络服务器上访问