首页 > 解决方案 > 如何在 scrapy 中指定 TLS 版本或禁用 TLS 验证

问题描述

本站:https ://www.cnbanbao.cn/

我在我的 MAC 上试过这个命令

openssl s_client  -connect www.cnbanbao.cn:443 -msg

结果:

>>> TLS 1.2 Handshake [length 00c3], ClientHello
...
<<< TLS 1.2 Handshake [length 0051], ServerHello
...
<<< TLS 1.0 Handshake [length 0a4a], Certificate
...
<<< TLS 1.0 Handshake [length 0004], ServerHelloDone
...
>>> TLS 1.0 Handshake [length 0106], ClientKeyExchange
...
>>> TLS 1.0 ChangeCipherSpec [length 0001]
...
>>> TLS 1.0 Handshake [length 0010], Finished

我认为问题可能是该站点对 ServerHello 使用TLS1.2TLS Handshake使用TLS1.0 ,当我尝试在scrapy中下载该站点时会导致问题

scrapy shell 'https://www.cnbanbao.cn/'
2019-01-24 11:49:57 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.cnbanbao.cn/> (failed 1 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2019-01-24 11:49:58 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.cnbanbao.cn/> (failed 2 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
2019-01-24 11:49:59 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET https://www.cnbanbao.cn/> (failed 3 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]

我试图指定 TLS1.0 版本:抓取网站时的 SSL 问题,它似乎不起作用

我也尝试在 Scrapy 中禁用 SSL 证书验证,但我不知道如何定义HttpsDownloaderIgnoreCNError来禁用 ssl 验证

有什么想法可以使以下命令起作用吗?

scrapy shell 'https://www.cnbanbao.cn/'

标签: pythonsslscrapy

解决方案


推荐阅读