ruby-on-rails - 如何在不禁用 ssl 的情况下抓取网站
问题描述
我必须在不禁用 SSL 的情况下抓取网站。我尝试使用 Nokogiri gem
require 'httparty'
require 'nokogiri'
require 'open-uri'
page = open("https://mywebsiteurl.com",{ssl_verify_mode: OpenSSL::SSL::VERIFY_NONE})
doc = Nokogiri::HTML(page)
puts doc
此代码通过禁用 SSL 来工作。但我希望它在不禁用 SSL 的情况下工作。
当我尝试不禁用 SSL 时出现此错误
SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed (OpenSSL::SSL::SSLError)
当我这样做时,curl https://mywebsiteurl.com
我得到了这个结果。
* Hostname was NOT found in DNS cache
* Trying xxx.xxx.xxx.xxx...
* Connected to wxxxxxxxxx.com (xxx.xxx.xxx.xxx) port 443 (#0)
* successfully set certificate verify locations:
* CAfile: none
CApath: /etc/ssl/certs
* SSLv3, TLS handshake, Client hello (1):
* SSLv3, TLS handshake, Server hello (2):
* SSLv3, TLS handshake, CERT (11):
* SSLv3, TLS alert, Server hello (2):
* SSL certificate problem: certificate has expired
* Closing connection 0
curl: (60) SSL certificate problem: certificate has expired
More details here: http://curl.haxx.se/docs/sslcerts.html
curl performs SSL certificate verification by default, using a "bundle"
of Certificate Authority (CA) public keys (CA certs). If the default
bundle file isn't adequate, you can specify an alternate file
using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
the bundle, the certificate verification probably failed due to a
problem with the certificate (it might be expired, or the name might
not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
the -k (or --insecure) option.
解决方案
推荐阅读
- java - 我如何压缩包含 Java 中所有子文件夹的完整目录?
- javascript - React 样式可重用组件,它不是样式组件
- python-3.x - 如何在 python matplotlib 3D 图中提高颜色分辨率
- mysql - 是否有一个子查询可以使用同一表中不同列的最大日期来计算 datediff?
- asp.net-core - 如何添加对 Blazor 页面的引用?
- acumatica - 舍入一个字段
- django - 从 Django 中提取值
在 Python 3 中 - mysql - AWS RDS 手动快照是否以增量方式存储?
- javafx - 如何用鼠标在JavaFX中画一条线?
- java - OpenJDK 更改 Clp 卷