首页 > 解决方案 > How to manually throw a 503 error on scrapy?

问题描述

I am scraping Amazon, and I want to be able to throw a 503 error anytime I receive a captcha from the website. This would allow this webpage to be retried later. I can already detect if there is a captcha on the page, I just need to be able to throw the 503 error to retry it later. Below is the ideal way I would be able to accomplish my goal.

 if response.css('#captchacharacters').extract()[0]:
     # Insert code to throw an error

标签: pythonscrapy

解决方案


尝试使用像下面这样的 Downloadermiddleware,

from scrapy.downloadermiddlewares.retry import RetryMiddleware

class TutorialDownloaderMiddleware(RetryMiddleware):

    def process_response(self, request, response, spider):
        # test for captcha page
        if response.css('#captchacharacters').extract()[0]:
            reason = 'capcha'
            return self._retry(request, reason, spider) or response
        return response

不要忘记在设置中添加它,

DOWNLOADER_MIDDLEWARES = {
   'tutorial.middlewares.TutorialDownloaderMiddleware': 543,
}

它将重定向并重试。


推荐阅读