首页 > 解决方案 > Scrapu-Splash 返回 502 或 504 状态

问题描述

我的问题是我得到 502 或 504 状态。但情况并非总是如此。有时蜘蛛可以正常工作,因为它应该工作。但有时某些查询会返回 502 或 504。但是,这些始终是随机请求。有时也会发生蜘蛛首先返回某个页面的 502 或 504 状态,但后来,scrapy 通知我该页面已成功收集并保存(此变化显示在日志中)。怎么会这样?以及如何摆脱这些问题?

import os
import re
import scrapy
from scrapy_splash import SplashRequest
from .utils.config import HTML_DIR


class CqSpider(scrapy.Spider):
    name = "cq"    

    def start_requests(self):
        urls = [
            'https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-ucs-director-xss-O7T8ORYR',
            'https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-code-exec-wH3BNFb',
            'https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-dcnm-stored-xss-yJyqBJGU',
            'https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20190807-wms-oredirect',
            'https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20190501-asaftd-saml-vpn',
        ]

        for url in urls:

            yield SplashRequest(
                url=url,
                callback=self.save_response,
                endpoint='render.html',
                cb_kwargs=dict(path_dir=HTML_DIR),
                args={
                    'wait': 1.0,
                    'timeout': 90.0,
                },
            )


    def save_response(self, response, path_dir, filename=None):
        if not filename:
            filename = self.norm_url(response.url)
        print('===')
        print(filename)
        print('===')
        with open(path_dir / filename, 'wb') as f:
            f.write(response.body)

设置.py

BOT_NAME = 'splash'

SPIDER_MODULES = ['splash.spiders']
NEWSPIDER_MODULE = 'splash.spiders'
ROBOTSTXT_OBEY = False
DOWNLOAD_DELAY = 2
SPLASH_URL = 'http://172.35.0.1:8050'

DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
    'scrapy_user_agents.middlewares.RandomUserAgentMiddleware': 400,
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}

SPIDER_MIDDLEWARES = {
    'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}

DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'

日志:

docker-compose -f $(pwd)/tor/docker-compose.yml up -d
        Creating network "tor_tor_net" with driver "bridge"
        Creating tor ... done
        docker-compose -f $(pwd)/splash/docker-compose.yml up -d
        Creating network "splash_splash_net" with driver "bridge"
        Creating splash ... done
        docker run --rm --env-file .secrets --env-file .settings \
            --name cisco-crawler \
            -e DATA_DIR=/data \
            -v $(pwd)/data/crawlers/cisco:/data \
            -v $(pwd)/crawlers:/code \
            docker-secdev.ptsecurity.ru/crawler-base:fixed_cisco_crawler \
            cisco
        2020-08-18 15:51:01 [scrapy.utils.log] INFO: Scrapy 2.3.0 started (bot: splash)
        2020-08-18 15:51:01 [scrapy.utils.log] INFO: Versions: lxml 4.5.2.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 20.3.0, Python 3.7.8 (default, Aug  5 2020, 08:30:24) - [GCC 8.3.0], pyOpenSSL 19.1.0 (OpenSSL 1.1.1g  21 Apr 2020), cryptography 3.0, Platform Linux-5.4.0-42-generic-x86_64-with-debian-10.5
        2020-08-18 15:51:01 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor
        2020-08-18 15:51:01 [scrapy.crawler] INFO: Overridden settings:
        {'BOT_NAME': 'splash',
         'DOWNLOAD_DELAY': 2,
         'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter',
         'NEWSPIDER_MODULE': 'splash.spiders',
         'SPIDER_MODULES': ['splash.spiders']}
        2020-08-18 15:51:01 [scrapy.extensions.telnet] INFO: Telnet Password: a2db66b61d0f2c0d
        2020-08-18 15:51:01 [scrapy.middleware] INFO: Enabled extensions:
        ['scrapy.extensions.corestats.CoreStats',
         'scrapy.extensions.telnet.TelnetConsole',
         'scrapy.extensions.memusage.MemoryUsage',
         'scrapy.extensions.logstats.LogStats']
        2020-08-18 15:51:03 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: Android
        2020-08-18 15:51:03 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
        2020-08-18 15:51:03 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
        2020-08-18 15:51:04 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
        2020-08-18 15:51:04 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Sony BDV13, Brand: Sony, Model: BDV13
        2020-08-18 15:51:04 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Nintendo DSi, Brand: Nintendo, Model: DSi
        2020-08-18 15:51:04 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
        2020-08-18 15:51:04 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: Other
        2020-08-18 15:51:04 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: LYF F90M, Brand: LYF, Model: F90M
        2020-08-18 15:51:05 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Sony BDV14, Brand: Sony, Model: BDV14
        2020-08-18 15:51:05 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: LG Web0S SmartTV, Brand: LG, Model: Web0S SmartTV
        2020-08-18 15:51:05 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
        2020-08-18 15:51:06 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
        2020-08-18 15:51:06 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
        2020-08-18 15:51:06 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
        2020-08-18 15:51:06 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
        2020-08-18 15:51:06 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
        2020-08-18 15:51:06 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
        2020-08-18 15:51:06 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Sony BDV11, Brand: Sony, Model: BDV11
        2020-08-18 15:51:06 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
        2020-08-18 15:51:06 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
        2020-08-18 15:51:06 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
        2020-08-18 15:51:06 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
        2020-08-18 15:51:07 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Nintendo DSi, Brand: Nintendo, Model: DSi
        2020-08-18 15:51:07 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: LYF LF-2403N, Brand: LYF, Model: LF-2403N
        2020-08-18 15:51:07 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: LYF F90M, Brand: LYF, Model: F90M
        2020-08-18 15:51:07 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
        2020-08-18 15:51:08 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
        2020-08-18 15:51:08 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: Applebot
        2020-08-18 15:51:08 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
        2020-08-18 15:51:08 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: Applebot
        2020-08-18 15:51:09 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: Other
        2020-08-18 15:51:09 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: Other
        2020-08-18 15:51:09 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: PhantomJS
        2020-08-18 15:51:09 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: SMTBot
        2020-08-18 15:51:10 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: PhantomJS
        2020-08-18 15:51:10 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: PhantomJS
        2020-08-18 15:51:10 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: WebKit Nightly
        2020-08-18 15:51:10 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: PhantomJS
        2020-08-18 15:51:10 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
        2020-08-18 15:51:11 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: PhantomJS
        2020-08-18 15:51:11 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: PhantomJS
        2020-08-18 15:51:11 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: Robot
        2020-08-18 15:51:11 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
        2020-08-18 15:51:11 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: Zune
        2020-08-18 15:51:11 [scrapy.middleware] INFO: Enabled downloader middlewares:
        ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
         'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
         'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
         'scrapy_user_agents.middlewares.RandomUserAgentMiddleware',
         'scrapy.downloadermiddlewares.retry.RetryMiddleware',
         'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
         'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
         'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
         'scrapy_splash.SplashCookiesMiddleware',
         'scrapy_splash.SplashMiddleware',
         'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
         'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
         'scrapy.downloadermiddlewares.stats.DownloaderStats']
        2020-08-18 15:51:11 [scrapy.middleware] INFO: Enabled spider middlewares:
        ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
         'scrapy_splash.SplashDeduplicateArgsMiddleware',
         'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
         'scrapy.spidermiddlewares.referer.RefererMiddleware',
         'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
         'scrapy.spidermiddlewares.depth.DepthMiddleware']
        2020-08-18 15:51:11 [scrapy.middleware] INFO: Enabled item pipelines:
        []
        2020-08-18 15:51:11 [scrapy.core.engine] INFO: Spider opened
        2020-08-18 15:51:11 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
        2020-08-18 15:51:11 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
        2020-08-18 15:51:11 [py.warnings] WARNING: /usr/local/lib/python3.7/site-packages/scrapy_splash/request.py:41: ScrapyDeprecationWarning: Call to deprecated function to_native_str. Use to_unicode instead.
          url = to_native_str(url)
        
        2020-08-18 15:51:11 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36
        2020-08-18 15:51:11 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36
        2020-08-18 15:51:11 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36
        2020-08-18 15:51:11 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36
        2020-08-18 15:51:11 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36
        2020-08-18 15:51:11 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36
        2020-08-18 15:51:11 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 5.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36
        2020-08-18 15:51:11 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36
        2020-08-18 15:51:11 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
        2020-08-18 15:51:11 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36
        2020-08-18 15:51:31 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-ucs-director-xss-O7T8ORYR via http://172.35.0.1:8050/render.html> (failed 1 times): 502 Bad Gateway
        2020-08-18 15:51:31 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 5.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
        2020-08-18 15:51:31 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-dcnm-stored-xss-yJyqBJGU via http://172.35.0.1:8050/render.html> (failed 1 times): 502 Bad Gateway
        2020-08-18 15:51:31 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
        2020-08-18 15:51:38 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-code-exec-wH3BNFb via http://172.35.0.1:8050/render.html> (failed 1 times): 502 Bad Gateway
        2020-08-18 15:51:38 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36
        2020-08-18 15:51:41 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-dcnm-stored-xss-yJyqBJGU via http://172.35.0.1:8050/render.html> (failed 2 times): 502 Bad Gateway
        2020-08-18 15:51:41 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36
        2020-08-18 15:51:46 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-code-exec-wH3BNFb via http://172.35.0.1:8050/render.html> (failed 2 times): 502 Bad Gateway
        2020-08-18 15:51:46 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36
        2020-08-18 15:52:06 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-ucs-director-xss-O7T8ORYR via http://172.35.0.1:8050/render.html> (failed 2 times): 502 Bad Gateway
        2020-08-18 15:52:06 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.170 Safari/537.36
        2020-08-18 15:52:06 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20190807-wms-oredirect via http://172.35.0.1:8050/render.html> (failed 1 times): 502 Bad Gateway
        2020-08-18 15:52:06 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36
        2020-08-18 15:52:11 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
        2020-08-18 15:52:51 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20190501-asaftd-saml-vpn via http://172.35.0.1:8050/render.html> (failed 1 times): 504 Gateway Time-out
        2020-08-18 15:52:51 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36
        2020-08-18 15:52:59 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-code-exec-wH3BNFb via http://172.35.0.1:8050/render.html> (failed 3 times): 502 Bad Gateway
        2020-08-18 15:52:59 [scrapy.core.engine] DEBUG: Crawled (502) <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-code-exec-wH3BNFb via http://172.35.0.1:8050/render.html> (referer: None)
        2020-08-18 15:52:59 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <502 https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-code-exec-wH3BNFb>: HTTP status code is not handled or not allowed
        2020-08-18 15:53:11 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-dcnm-stored-xss-yJyqBJGU via http://172.35.0.1:8050/render.html> (failed 3 times): 504 Gateway Time-out
        2020-08-18 15:53:11 [scrapy.core.engine] DEBUG: Crawled (504) <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-dcnm-stored-xss-yJyqBJGU via http://172.35.0.1:8050/render.html> (referer: None)
        2020-08-18 15:53:11 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <504 https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-dcnm-stored-xss-yJyqBJGU>: HTTP status code is not handled or not allowed
        2020-08-18 15:53:11 [scrapy.extensions.logstats] INFO: Crawled 2 pages (at 2 pages/min), scraped 0 items (at 0 items/min)
        2020-08-18 15:53:25 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-ucs-director-xss-O7T8ORYR via http://172.35.0.1:8050/render.html> (failed 3 times): 502 Bad Gateway
        2020-08-18 15:53:25 [scrapy.core.engine] DEBUG: Crawled (502) <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-ucs-director-xss-O7T8ORYR via http://172.35.0.1:8050/render.html> (referer: None)
        2020-08-18 15:53:25 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <502 https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-ucs-director-xss-O7T8ORYR>: HTTP status code is not handled or not allowed
        2020-08-18 15:53:39 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20190807-wms-oredirect via http://172.35.0.1:8050/render.html> (failed 2 times): 504 Gateway Time-out
        2020-08-18 15:53:39 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36
        2020-08-18 15:54:11 [scrapy.extensions.logstats] INFO: Crawled 3 pages (at 1 pages/min), scraped 0 items (at 0 items/min)
        2020-08-18 15:54:21 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20190501-asaftd-saml-vpn via http://172.35.0.1:8050/render.html> (failed 2 times): 504 Gateway Time-out
        2020-08-18 15:54:21 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36
        2020-08-18 15:55:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20190501-asaftd-saml-vpn via http://172.35.0.1:8050/render.html> (referer: None)
        ===
        tools.cisco.com-security-center-content-CiscoSecurityAdvisory-cisco-sa-20190501-asaftd-saml-vpn
        ===
        2020-08-18 15:55:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20190807-wms-oredirect via http://172.35.0.1:8050/render.html> (referer: None)
        ===
        tools.cisco.com-security-center-content-CiscoSecurityAdvisory-cisco-sa-20190807-wms-oredirect
        ===
        2020-08-18 15:55:05 [scrapy.core.engine] INFO: Closing spider (finished)
        2020-08-18 15:55:05 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
        {'downloader/request_bytes': 12060,
         'downloader/request_count': 15,
         'downloader/request_method_count/POST': 15,
         'downloader/response_bytes': 454960,
         'downloader/response_count': 15,
         'downloader/response_status_count/200': 2,
         'downloader/response_status_count/502': 9,
         'downloader/response_status_count/504': 4,
         'elapsed_time_seconds': 233.602478,
         'finish_reason': 'finished',
         'finish_time': datetime.datetime(2020, 8, 18, 15, 55, 5, 265746),
         'httperror/response_ignored_count': 3,
         'httperror/response_ignored_status_count/502': 2,
         'httperror/response_ignored_status_count/504': 1,
         'log_count/DEBUG': 35,
         'log_count/ERROR': 3,
         'log_count/INFO': 16,
         'log_count/WARNING': 46,
         'memusage/max': 59252736,
         'memusage/startup': 58208256,
         'response_received_count': 5,
         'retry/count': 10,
         'retry/max_reached': 3,
         'retry/reason_count/502 Bad Gateway': 7,
         'retry/reason_count/504 Gateway Time-out': 3,
         'scheduler/dequeued': 20,
         'scheduler/dequeued/memory': 20,
         'scheduler/enqueued': 20,
         'scheduler/enqueued/memory': 20,
         'splash/render.html/request_count': 5,
         'splash/render.html/response_count/200': 2,
         'splash/render.html/response_count/502': 9,
         'splash/render.html/response_count/504': 4,
         'start_time': datetime.datetime(2020, 8, 18, 15, 51, 11, 663268)}
        2020-08-18 15:55:05 [scrapy.core.engine] INFO: Spider closed (finished)
        docker-compose -f $(pwd)/splash/docker-compose.yml down
        Stopping splash ... done
        Removing splash ... done
        Removing network splash_splash_net
        docker-compose -f $(pwd)/tor/docker-compose.yml down
        Stopping tor ... done
        Removing tor ... done
        Removing network tor_tor_net`

标签: python-3.xscrapy-splash

解决方案


推荐阅读