首页 > 解决方案 > 如何在不禁用 ssl 的情况下抓取网站

问题描述

我必须在不禁用 SSL 的情况下抓取网站。我尝试使用 Nokogiri gem

require 'httparty'
require 'nokogiri'
require 'open-uri'


page = open("https://mywebsiteurl.com",{ssl_verify_mode: OpenSSL::SSL::VERIFY_NONE})
doc = Nokogiri::HTML(page)
puts doc

此代码通过禁用 SSL 来工作。但我希望它在不禁用 SSL 的情况下工作。

当我尝试不禁用 SSL 时出现此错误

SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed (OpenSSL::SSL::SSLError)

当我这样做时,curl https://mywebsiteurl.com我得到了这个结果。

* Hostname was NOT found in DNS cache
*   Trying xxx.xxx.xxx.xxx...
* Connected to wxxxxxxxxx.com (xxx.xxx.xxx.xxx) port 443 (#0)
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
* SSLv3, TLS handshake, Client hello (1):
* SSLv3, TLS handshake, Server hello (2):
* SSLv3, TLS handshake, CERT (11):
* SSLv3, TLS alert, Server hello (2):
* SSL certificate problem: certificate has expired
* Closing connection 0
curl: (60) SSL certificate problem: certificate has expired
More details here: http://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.

标签: ruby-on-railssslweb-scrapingnokogiri

解决方案


推荐阅读