python - 如何绕过 Cloudflare 和 reCAPTCHA 获取页面内容
问题描述
我想扔一个带有代理的页面。我使用 cfscrapy 进入页面并通过 Cloudflare(第一个“挑战”),然后页面询问我 reCAPTCHA 以了解我是否是人类。这就是问题所在,我想我需要传递用户代理和 cookie(可能我做了代码错误),但我不知道该怎么做。
link = "https://www.oneblockdown.it/en/footwear-sneakers/adidas/men-unisex/adidas-originals-yeezy-boost-350-v2/9438"
proxies = get_proxy(proxy_list) #I get proxies from a file...
scraper = cfscrape.create_scraper() # returns a CloudflareScraper instance
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36"
}
try:
if(use_proxies):
print("[Proxy]: " + proxies['http'])
r = scraper.get(link, proxies=proxies)
except:
print("Connection to URL <" + link + "> failed.")
return
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.prettify())
最后一次打印的响应是这样的:
'''
<script src="https://www.google.com/recaptcha/api.js?hl=" type="text/javascript">
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.3/jquery.js" type="text/javascript">
</script>
</head>
<body>
<div class="g-recaptcha" data-callback="getCaptchaResult" data-sitekey="6Le49hgUAAAAAIv3wrILeXIrOSdM3_5oxK4C6m48" data-size="invisible">
</div>
<script type="text/javascript">
window.onload = function () { grecaptcha.execute(); };
function getCaptchaResult(response) {
$.post("/index.php", {action: "captcha_verify", captcha: response, version: 37}, function(result){
var timeout = result ? 0 : 2500;
setTimeout(function() {
window.location.reload();
}, timeout);
});
}
</script>
<script type="text/javascript">
window.NREUM||(NREUM={});NREUM.info={"beacon":"bam.nr-data.net","licenseKey":"97b599ea8e","applicationID":"23522071","transactionName":"YFxXbENSCxEFUhVfWlkWdk1CRwoPS1cOWUFAXFRKHEALBwVaBERGGFhRUVVSFg==","queueTime":0,"applicationTime":54,"atts":"TBtUGgtIGB8=","errorBeacon":"bam.nr-data.net","agent":""}
</script>
</body>
</html>
'''
我需要验证我是人类。我怎样才能通过这个挑战?
解决方案
推荐阅读
- javascript - 媒体查询在js中不起作用
- ios - AVMutableComposition 调整大小问题
- c++ - 是否可以覆盖 boost::bimaps::bimap.left 的“查找”和“擦除”方法?怎么做?
- python - Scipy.Optimize.Minimize 动态约束问题
- solr - 为什么我们应该在单独的 DC 中运行 dse search
- java - 为什么我的消息在 Java 套接字服务器中只发送一次?
- javascript - 解密 RSA 数据时出现 DOMException
- react-native - 如何在 React Native 中使用 Animated 将宽度设置为自动
- arangodb - ArangoDB 损坏坏表幻数
- node.js - npm run 弹出后出错