首页 > 解决方案 > Use requests to post captcha answer

问题描述

So I have this captcha I need to solve and send the response with Requests Post method but I'm not sure how to start.

<form name="captchaForm" action="/captcha/eval.do" onsubmit="document.captchaForm.hash.value=window.location.hash;">
        <img id="captchaTag" src="/captcha/captcha.do?id=132.204.251.80">
        <audio id="audioCaptchaTag" style="display: none;" preload="auto" onplay="$('#captchaResponse').focus();" controls="" src="/captcha/audioCaptcha.do?id=132.204.251.80&amp;lang=fr&amp;cacheBuster=1aca7bbb658bf40435965772b3343124f" type="audio/wav"></audio>
        <button id="toggleAudio" alt="Click here to access the audible captcha" type="button" onclick="$('#captchaTag').toggle(); $('#audioCaptchaTag').toggle(); $('#captchaResponse').focus(); "><i class="fas fa-headphones"></i></button>
        <input autocapitalize="none" id="captchaResponse" style="margin: 0;" type="text" name="jcaptcha" value="" autofocus="">
        <input type="hidden" name="path" value="/fr/qc/qcrdl/doc/2009/2009canlii97762/2009canlii97762.html">
        <input type="hidden" name="queryString" value="">
        <input type="hidden" name="hash" value="">
        <input type="hidden" name="id" value="132.204.251.80">
        <input type="submit" value="ok">
</form>

For now this is my script basically I want to send the input with Post method:

import requests
from bs4 import BeautifulSoup
import cv2

headers = {
    'Access-Control-Allow-Origin': '*',
    'Access-Control-Allow-Methods': 'GET',
    'Access-Control-Allow-Headers': 'Content-Type',
    'Access-Control-Max-Age': '3600',
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0'
    }

session = requests.Session()
req = session.get(url, headers=headers)
soup = BeautifulSoup(req.content, "lxml")
captcha = soup.find("form", {"name": "captchaForm"})
if captcha is None:
   # Do stuff
else:
   print(".......... YOU'VE HIT A CAPTCHA")
   capatchaImage = soup.find("img",{"id":"captchaTag"}).get("src")
   cv2.imshow("captcha", captchaImage)
   cv2.waitKey()
   captchaInput = input("Please enter captcha: "))
   continue

标签: pythonpostcaptchapython-requests-html

解决方案


我在尝试刮canlii的同时遇到了这个答案。我还没有设法解决验证码,但发现通过 webdriver 冲浪的同时刮擦减少了大约 20 倍的验证码。通过每次我打验证码时切换我的 IP 地址,我设法将我的抓取时间提高到 100%。如果您可以使用一些灵感,这里有一些代码:https ://github.com/isovector/canlii-scraper/blob/23ba62bde125c150f152c710a7b5506aa88d2555/src/Lib.hs#L35-L46


推荐阅读