python - python请求问题:cloudflare错误消息“启用cookies”
问题描述
我正计划为 Sneakersnstuff.com 网站创建一个基本的网络爬虫,但由于一个错误,我的努力提前停止了。当向 url https://www.sneakersnstuff.com/请求时,而不是显示网站的 html,甚至是入口验证码,我被重定向到带有错误消息“启用 cookie”的 cloudflare 页面。我的代码和响应都如下所示
import requests
import cfscrape
session = requests.session()
response = session.get('https://www.sneakersnstuff.com/')
print(response.headers)
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]> <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]> <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!-->
<html class="no-js" lang="en-US">
<!--<![endif]-->
<head>
<title>Access denied | www.sneakersnstuff.com used Cloudflare to restrict access</title>
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1" />
<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" type="text/css"
media="screen,projection" />
<!--[if lt IE 9]><link rel="stylesheet" id='cf_styles-ie-css' href="/cdn-cgi/styles/cf.errors.ie.css" type="text/css" media="screen,projection" /><![endif]-->
<style type="text/css">
body {
margin: 0;
padding: 0
}
</style>
<!--[if gte IE 10]><!-->
<script type="text/javascript" src="/cdn-cgi/scripts/zepto.min.js"></script>
<!--<![endif]-->
<!--[if gte IE 10]><!-->
<script type="text/javascript" src="/cdn-cgi/scripts/cf.common.js"></script>
<!--<![endif]-->
</head>
<body>
<div id="cf-wrapper">
<div class="cf-alert cf-alert-error cf-cookie-error" id="cookie-alert" data-translate="enable_cookies">Please
enable cookies.</div>
<div id="cf-error-details" class="cf-error-details-wrapper">
<div class="cf-wrapper cf-header cf-error-overview">
<h1>
<span class="cf-error-type" data-translate="error">Error</span>
<span class="cf-error-code">1020</span>
<small class="heading-ray-id">Ray ID: 578133293d83e0d6 • 2020-03-22 16:13:25 UTC</small>
</h1>
<h2 class="cf-subheadline">Access denied</h2>
</div><!-- /.header -->
<section></section><!-- spacer -->
<div class="cf-section cf-wrapper">
<div class="cf-columns two">
<div class="cf-column">
<h2 data-translate="what_happened">What happened?</h2>
<p>This website is using a security service to protect itself from online attacks.</p>
</div>
</div>
</div><!-- /.section -->
<div class="cf-error-footer cf-wrapper">
<p>
<span class="cf-footer-item">Cloudflare Ray ID: <strong>578133293d83e0d6</strong></span>
<span class="cf-footer-separator">•</span>
<span class="cf-footer-item"><span>Your IP</span>: 96.241.108.243</span>
<span class="cf-footer-separator">•</span>
<span class="cf-footer-item"><span>Performance & security by</span> <a
href="https://www.cloudflare.com/5xx-error-landing?utm_source=error_footer" id="brand_link"
target="_blank">Cloudflare</a></span>
</p>
</div><!-- /.error-footer -->
</div><!-- /#cf-error-details -->
</div><!-- /#cf-wrapper -->
<script type="text/javascript">
window._cf_translation = {};
</script>
</body>
</html>
我曾尝试使用许多名为 cfscrape 的库推荐,但无济于事。
解决方案
添加Browser/User-Agent Filtering
到 cloudcraper 对我有用。
import cloudscraper
from bs4 import BeautifulSoup
# Adding Browser / User-Agent Filtering should help ie.
# will give you only desktop firefox User-Agents on Windows
scraper = cloudscraper.create_scraper(browser={'browser': 'firefox','platform': 'windows','mobile': False})
html = scraper.get("https://www.sneakersnstuff.com/").content
soup = BeautifulSoup(html, 'html.parser')
print(soup)
推荐阅读
- xamarin - Xamarin Frame 只有一个圆角
- matlab - 如何在 Matlab 实时脚本编辑器中显示生成的乳胶表达式?
- sql - Django找到多对多关系的完全匹配
- flutter - 如何在 Flutter 中将颜色转换为字符串?
- histogram - Google 地球引擎中的直方图输出
- php - 无论如何在WordPress中格式化我的端点URL?
- reactjs - ReactNative:无法按下 Animated.View 的事件父组件
- angular - 类型上不存在属性“VAR_PLURAL”
- python-3.x - Pycryptodome Python3 RSA.importKey 阻塞/挂起
- php - Laravel Session 值不同/未在其他控制器(Redis)中更新