python - Use requests to post captcha answer
问题描述
So I have this captcha I need to solve and send the response with Requests Post method but I'm not sure how to start.
<form name="captchaForm" action="/captcha/eval.do" onsubmit="document.captchaForm.hash.value=window.location.hash;">
<img id="captchaTag" src="/captcha/captcha.do?id=132.204.251.80">
<audio id="audioCaptchaTag" style="display: none;" preload="auto" onplay="$('#captchaResponse').focus();" controls="" src="/captcha/audioCaptcha.do?id=132.204.251.80&lang=fr&cacheBuster=1aca7bbb658bf40435965772b3343124f" type="audio/wav"></audio>
<button id="toggleAudio" alt="Click here to access the audible captcha" type="button" onclick="$('#captchaTag').toggle(); $('#audioCaptchaTag').toggle(); $('#captchaResponse').focus(); "><i class="fas fa-headphones"></i></button>
<input autocapitalize="none" id="captchaResponse" style="margin: 0;" type="text" name="jcaptcha" value="" autofocus="">
<input type="hidden" name="path" value="/fr/qc/qcrdl/doc/2009/2009canlii97762/2009canlii97762.html">
<input type="hidden" name="queryString" value="">
<input type="hidden" name="hash" value="">
<input type="hidden" name="id" value="132.204.251.80">
<input type="submit" value="ok">
</form>
For now this is my script basically I want to send the input with Post method:
import requests
from bs4 import BeautifulSoup
import cv2
headers = {
'Access-Control-Allow-Origin': '*',
'Access-Control-Allow-Methods': 'GET',
'Access-Control-Allow-Headers': 'Content-Type',
'Access-Control-Max-Age': '3600',
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0'
}
session = requests.Session()
req = session.get(url, headers=headers)
soup = BeautifulSoup(req.content, "lxml")
captcha = soup.find("form", {"name": "captchaForm"})
if captcha is None:
# Do stuff
else:
print(".......... YOU'VE HIT A CAPTCHA")
capatchaImage = soup.find("img",{"id":"captchaTag"}).get("src")
cv2.imshow("captcha", captchaImage)
cv2.waitKey()
captchaInput = input("Please enter captcha: "))
continue
解决方案
我在尝试刮canlii的同时遇到了这个答案。我还没有设法解决验证码,但发现通过 webdriver 冲浪的同时刮擦减少了大约 20 倍的验证码。通过每次我打验证码时切换我的 IP 地址,我设法将我的抓取时间提高到 100%。如果您可以使用一些灵感,这里有一些代码:https ://github.com/isovector/canlii-scraper/blob/23ba62bde125c150f152c710a7b5506aa88d2555/src/Lib.hs#L35-L46
推荐阅读
- mysql - mySQL 查询根据行中的最新日期返回所有行
- wso2ei - WSO2 EI 7 - RoleBasedAuthorizationHandler java 类
- database - 如何通过组合查询 findAll 获取数据对象?
- ckeditor - 无法在表格单元格属性输入表单中输入任何值
- authentication - Microsoft Teams 应用程序的 Auth0 登录
- ruby-on-rails - 活动记录从另一个模型中选择值并用于当前模型中的位置
- scipy-optimize - 使用 scipy.optimize.linprog 解决 LP 问题并检索画面
- javascript - 在 HTML5 画布中绘制一系列矩形以完全填满画布
- c++ - C++构造函数和析构函数设置双向链表
- javascript - 我的 Jquery Transit 不工作,我该怎么办