python-3.x - Scrapu-Splash 返回 502 或 504 状态
问题描述
我的问题是我得到 502 或 504 状态。但情况并非总是如此。有时蜘蛛可以正常工作,因为它应该工作。但有时某些查询会返回 502 或 504。但是,这些始终是随机请求。有时也会发生蜘蛛首先返回某个页面的 502 或 504 状态,但后来,scrapy 通知我该页面已成功收集并保存(此变化显示在日志中)。怎么会这样?以及如何摆脱这些问题?
import os
import re
import scrapy
from scrapy_splash import SplashRequest
from .utils.config import HTML_DIR
class CqSpider(scrapy.Spider):
name = "cq"
def start_requests(self):
urls = [
'https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-ucs-director-xss-O7T8ORYR',
'https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-code-exec-wH3BNFb',
'https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-dcnm-stored-xss-yJyqBJGU',
'https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20190807-wms-oredirect',
'https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20190501-asaftd-saml-vpn',
]
for url in urls:
yield SplashRequest(
url=url,
callback=self.save_response,
endpoint='render.html',
cb_kwargs=dict(path_dir=HTML_DIR),
args={
'wait': 1.0,
'timeout': 90.0,
},
)
def save_response(self, response, path_dir, filename=None):
if not filename:
filename = self.norm_url(response.url)
print('===')
print(filename)
print('===')
with open(path_dir / filename, 'wb') as f:
f.write(response.body)
设置.py
BOT_NAME = 'splash'
SPIDER_MODULES = ['splash.spiders']
NEWSPIDER_MODULE = 'splash.spiders'
ROBOTSTXT_OBEY = False
DOWNLOAD_DELAY = 2
SPLASH_URL = 'http://172.35.0.1:8050'
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
'scrapy_user_agents.middlewares.RandomUserAgentMiddleware': 400,
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}
SPIDER_MIDDLEWARES = {
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
日志:
docker-compose -f $(pwd)/tor/docker-compose.yml up -d
Creating network "tor_tor_net" with driver "bridge"
Creating tor ... done
docker-compose -f $(pwd)/splash/docker-compose.yml up -d
Creating network "splash_splash_net" with driver "bridge"
Creating splash ... done
docker run --rm --env-file .secrets --env-file .settings \
--name cisco-crawler \
-e DATA_DIR=/data \
-v $(pwd)/data/crawlers/cisco:/data \
-v $(pwd)/crawlers:/code \
docker-secdev.ptsecurity.ru/crawler-base:fixed_cisco_crawler \
cisco
2020-08-18 15:51:01 [scrapy.utils.log] INFO: Scrapy 2.3.0 started (bot: splash)
2020-08-18 15:51:01 [scrapy.utils.log] INFO: Versions: lxml 4.5.2.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 20.3.0, Python 3.7.8 (default, Aug 5 2020, 08:30:24) - [GCC 8.3.0], pyOpenSSL 19.1.0 (OpenSSL 1.1.1g 21 Apr 2020), cryptography 3.0, Platform Linux-5.4.0-42-generic-x86_64-with-debian-10.5
2020-08-18 15:51:01 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor
2020-08-18 15:51:01 [scrapy.crawler] INFO: Overridden settings:
{'BOT_NAME': 'splash',
'DOWNLOAD_DELAY': 2,
'DUPEFILTER_CLASS': 'scrapy_splash.SplashAwareDupeFilter',
'NEWSPIDER_MODULE': 'splash.spiders',
'SPIDER_MODULES': ['splash.spiders']}
2020-08-18 15:51:01 [scrapy.extensions.telnet] INFO: Telnet Password: a2db66b61d0f2c0d
2020-08-18 15:51:01 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats']
2020-08-18 15:51:03 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: Android
2020-08-18 15:51:03 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-08-18 15:51:03 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-08-18 15:51:04 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-08-18 15:51:04 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Sony BDV13, Brand: Sony, Model: BDV13
2020-08-18 15:51:04 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Nintendo DSi, Brand: Nintendo, Model: DSi
2020-08-18 15:51:04 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-08-18 15:51:04 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: Other
2020-08-18 15:51:04 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: LYF F90M, Brand: LYF, Model: F90M
2020-08-18 15:51:05 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Sony BDV14, Brand: Sony, Model: BDV14
2020-08-18 15:51:05 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: LG Web0S SmartTV, Brand: LG, Model: Web0S SmartTV
2020-08-18 15:51:05 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-08-18 15:51:06 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-08-18 15:51:06 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-08-18 15:51:06 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-08-18 15:51:06 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-08-18 15:51:06 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-08-18 15:51:06 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-08-18 15:51:06 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Sony BDV11, Brand: Sony, Model: BDV11
2020-08-18 15:51:06 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-08-18 15:51:06 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-08-18 15:51:06 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-08-18 15:51:06 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-08-18 15:51:07 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Nintendo DSi, Brand: Nintendo, Model: DSi
2020-08-18 15:51:07 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: LYF LF-2403N, Brand: LYF, Model: LF-2403N
2020-08-18 15:51:07 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: LYF F90M, Brand: LYF, Model: F90M
2020-08-18 15:51:07 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-08-18 15:51:08 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-08-18 15:51:08 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: Applebot
2020-08-18 15:51:08 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-08-18 15:51:08 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: Applebot
2020-08-18 15:51:09 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: Other
2020-08-18 15:51:09 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: Other
2020-08-18 15:51:09 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: PhantomJS
2020-08-18 15:51:09 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: SMTBot
2020-08-18 15:51:10 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: PhantomJS
2020-08-18 15:51:10 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: PhantomJS
2020-08-18 15:51:10 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: WebKit Nightly
2020-08-18 15:51:10 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: PhantomJS
2020-08-18 15:51:10 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-08-18 15:51:11 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: PhantomJS
2020-08-18 15:51:11 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: PhantomJS
2020-08-18 15:51:11 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: Robot
2020-08-18 15:51:11 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedDeviceType] Family: Other, Brand: None, Model: None
2020-08-18 15:51:11 [scrapy_user_agents.user_agent_picker] WARNING: [UnsupportedBrowserType] Family: Zune
2020-08-18 15:51:11 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy_user_agents.middlewares.RandomUserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy_splash.SplashCookiesMiddleware',
'scrapy_splash.SplashMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2020-08-18 15:51:11 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy_splash.SplashDeduplicateArgsMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2020-08-18 15:51:11 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2020-08-18 15:51:11 [scrapy.core.engine] INFO: Spider opened
2020-08-18 15:51:11 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-08-18 15:51:11 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-08-18 15:51:11 [py.warnings] WARNING: /usr/local/lib/python3.7/site-packages/scrapy_splash/request.py:41: ScrapyDeprecationWarning: Call to deprecated function to_native_str. Use to_unicode instead.
url = to_native_str(url)
2020-08-18 15:51:11 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36
2020-08-18 15:51:11 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36
2020-08-18 15:51:11 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36
2020-08-18 15:51:11 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36
2020-08-18 15:51:11 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36
2020-08-18 15:51:11 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36
2020-08-18 15:51:11 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 5.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36
2020-08-18 15:51:11 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36
2020-08-18 15:51:11 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
2020-08-18 15:51:11 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36
2020-08-18 15:51:31 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-ucs-director-xss-O7T8ORYR via http://172.35.0.1:8050/render.html> (failed 1 times): 502 Bad Gateway
2020-08-18 15:51:31 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 5.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36
2020-08-18 15:51:31 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-dcnm-stored-xss-yJyqBJGU via http://172.35.0.1:8050/render.html> (failed 1 times): 502 Bad Gateway
2020-08-18 15:51:31 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
2020-08-18 15:51:38 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-code-exec-wH3BNFb via http://172.35.0.1:8050/render.html> (failed 1 times): 502 Bad Gateway
2020-08-18 15:51:38 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36
2020-08-18 15:51:41 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-dcnm-stored-xss-yJyqBJGU via http://172.35.0.1:8050/render.html> (failed 2 times): 502 Bad Gateway
2020-08-18 15:51:41 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36
2020-08-18 15:51:46 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-code-exec-wH3BNFb via http://172.35.0.1:8050/render.html> (failed 2 times): 502 Bad Gateway
2020-08-18 15:51:46 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36
2020-08-18 15:52:06 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-ucs-director-xss-O7T8ORYR via http://172.35.0.1:8050/render.html> (failed 2 times): 502 Bad Gateway
2020-08-18 15:52:06 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.170 Safari/537.36
2020-08-18 15:52:06 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20190807-wms-oredirect via http://172.35.0.1:8050/render.html> (failed 1 times): 502 Bad Gateway
2020-08-18 15:52:06 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36
2020-08-18 15:52:11 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-08-18 15:52:51 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20190501-asaftd-saml-vpn via http://172.35.0.1:8050/render.html> (failed 1 times): 504 Gateway Time-out
2020-08-18 15:52:51 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36
2020-08-18 15:52:59 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-code-exec-wH3BNFb via http://172.35.0.1:8050/render.html> (failed 3 times): 502 Bad Gateway
2020-08-18 15:52:59 [scrapy.core.engine] DEBUG: Crawled (502) <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-code-exec-wH3BNFb via http://172.35.0.1:8050/render.html> (referer: None)
2020-08-18 15:52:59 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <502 https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-code-exec-wH3BNFb>: HTTP status code is not handled or not allowed
2020-08-18 15:53:11 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-dcnm-stored-xss-yJyqBJGU via http://172.35.0.1:8050/render.html> (failed 3 times): 504 Gateway Time-out
2020-08-18 15:53:11 [scrapy.core.engine] DEBUG: Crawled (504) <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-dcnm-stored-xss-yJyqBJGU via http://172.35.0.1:8050/render.html> (referer: None)
2020-08-18 15:53:11 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <504 https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-dcnm-stored-xss-yJyqBJGU>: HTTP status code is not handled or not allowed
2020-08-18 15:53:11 [scrapy.extensions.logstats] INFO: Crawled 2 pages (at 2 pages/min), scraped 0 items (at 0 items/min)
2020-08-18 15:53:25 [scrapy.downloadermiddlewares.retry] ERROR: Gave up retrying <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-ucs-director-xss-O7T8ORYR via http://172.35.0.1:8050/render.html> (failed 3 times): 502 Bad Gateway
2020-08-18 15:53:25 [scrapy.core.engine] DEBUG: Crawled (502) <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-ucs-director-xss-O7T8ORYR via http://172.35.0.1:8050/render.html> (referer: None)
2020-08-18 15:53:25 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <502 https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-ucs-director-xss-O7T8ORYR>: HTTP status code is not handled or not allowed
2020-08-18 15:53:39 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20190807-wms-oredirect via http://172.35.0.1:8050/render.html> (failed 2 times): 504 Gateway Time-out
2020-08-18 15:53:39 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36
2020-08-18 15:54:11 [scrapy.extensions.logstats] INFO: Crawled 3 pages (at 1 pages/min), scraped 0 items (at 0 items/min)
2020-08-18 15:54:21 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20190501-asaftd-saml-vpn via http://172.35.0.1:8050/render.html> (failed 2 times): 504 Gateway Time-out
2020-08-18 15:54:21 [scrapy_user_agents.middlewares] DEBUG: Assigned User-Agent Mozilla/5.0 (Windows NT 6.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36
2020-08-18 15:55:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20190501-asaftd-saml-vpn via http://172.35.0.1:8050/render.html> (referer: None)
===
tools.cisco.com-security-center-content-CiscoSecurityAdvisory-cisco-sa-20190501-asaftd-saml-vpn
===
2020-08-18 15:55:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20190807-wms-oredirect via http://172.35.0.1:8050/render.html> (referer: None)
===
tools.cisco.com-security-center-content-CiscoSecurityAdvisory-cisco-sa-20190807-wms-oredirect
===
2020-08-18 15:55:05 [scrapy.core.engine] INFO: Closing spider (finished)
2020-08-18 15:55:05 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 12060,
'downloader/request_count': 15,
'downloader/request_method_count/POST': 15,
'downloader/response_bytes': 454960,
'downloader/response_count': 15,
'downloader/response_status_count/200': 2,
'downloader/response_status_count/502': 9,
'downloader/response_status_count/504': 4,
'elapsed_time_seconds': 233.602478,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2020, 8, 18, 15, 55, 5, 265746),
'httperror/response_ignored_count': 3,
'httperror/response_ignored_status_count/502': 2,
'httperror/response_ignored_status_count/504': 1,
'log_count/DEBUG': 35,
'log_count/ERROR': 3,
'log_count/INFO': 16,
'log_count/WARNING': 46,
'memusage/max': 59252736,
'memusage/startup': 58208256,
'response_received_count': 5,
'retry/count': 10,
'retry/max_reached': 3,
'retry/reason_count/502 Bad Gateway': 7,
'retry/reason_count/504 Gateway Time-out': 3,
'scheduler/dequeued': 20,
'scheduler/dequeued/memory': 20,
'scheduler/enqueued': 20,
'scheduler/enqueued/memory': 20,
'splash/render.html/request_count': 5,
'splash/render.html/response_count/200': 2,
'splash/render.html/response_count/502': 9,
'splash/render.html/response_count/504': 4,
'start_time': datetime.datetime(2020, 8, 18, 15, 51, 11, 663268)}
2020-08-18 15:55:05 [scrapy.core.engine] INFO: Spider closed (finished)
docker-compose -f $(pwd)/splash/docker-compose.yml down
Stopping splash ... done
Removing splash ... done
Removing network splash_splash_net
docker-compose -f $(pwd)/tor/docker-compose.yml down
Stopping tor ... done
Removing tor ... done
Removing network tor_tor_net`
解决方案
推荐阅读
- list - 在 Raku 的列表中查找第一次出现的 1 位数字
- docker - 在 Window 上安装 ingress-nginx 的问题(不是 minikube)
- node.js - 在发送到 api 响应之前修改 mongodb 记录
- javascript - 使用 typescript 访问 react 中的 json 对象
- lua - 使用 string.gsub 打印“\48\49”时如何将“\48\49”转换为“01”?
- heroku - 如何在 Heroku 中安装 maxima
- r - 如何计算随机样本的cv函数
- python - Tensorflow:从类标签创建 y 索引
- c++ - 如何在复杂度 O(n) 中解决单调数组问题?
- node.js - 如何在测试套件中只模拟一次函数?