首页 > 解决方案 > 如何从 https://exampe.com/captcha.ashx 获取图像

问题描述

我正在尝试从外部站点获取 recaptcha 图像,但我总是收到 html 响应。

返回的html是:

<html> 
<head>
<title>TuEnvio</title>
  <style> body { background-color: #dfe6e9; margin: 0; position: absolute; top: 50%; left: 50%; -ms-transform: translate(-50%, -50%); transform: translate(-50%, -50%); } .lds-grid { display: inline-block; position: relative; width: 80px; height: 80px; } .lds-grid div { position: absolute; width: 16px; height: 16px; border-radius: 50%; background: #d63031; animation: lds-grid 1.2s linear infinite; } .lds-grid div:nth-child(1) { top: 8px; left: 8px; animation-delay: 0s; } .lds-grid div:nth-child(2) { top: 8px; left: 32px; animation-delay: -0.4s; } .lds-grid div:nth-child(3) { top: 8px; left: 56px; animation-delay: -0.8s; } .lds-grid div:nth-child(4) { top: 32px; left: 8px; animation-delay: -0.4s; } .lds-grid div:nth-child(5) { top: 32px; left: 32px; animation-delay: -0.8s; } .lds-grid div:nth-child(6) { top: 32px; left: 56px; animation-delay: -1.2s; } .lds-grid div:nth-child(7) { top: 56px; left: 8px; animation-delay: -0.8s; } .lds-grid div:nth-child(8) { top: 56px; left: 32px; animation-delay: -1.2s; } .lds-grid div:nth-child(9) { top: 56px; left: 56px; animation-delay: -1.6s; } @keyframes lds-grid {  0%, 100% {opacity: 1; }  50% {opacity: 0.5; } }</style>
  </head>
  <body>
  <div class="lds-grid"> 
  <div></div> <div></div> <div></div> <div></div> <div></div> <div></div> <div></div> <div></div> <div></div></div>
  <script type="text/javascript" src="/aes.min.js">
  </script>
  <script> 
  function toNumbers(d) { 
  var e = []; 
  d.replace(/(..)/g, function (d) {   e.push(parseInt(d, 16)); });
  return e; }
  function toHex() {
  for (   var d = [],     d = 1 == arguments.length && arguments[0].constructor == Array ? arguments[0] : arguments,e = "",f = 0;f < d.length;f++ )e += (16 > d[f] ? "0" : "") + d[f].toString(16);
  return e.toLowerCase();
  }
  var a = toNumbers("d68d69a9a746d20032277ede658ba3ad"), b = toNumbers("58c9e810e2ebcc49ae9ee28af1c6dd53"), c = toNumbers("0102c6e95e39d07a5b4b5bb0b5dcd89c");
  document.cookie = "ASP.KLR=" + toHex(slowAES.decrypt(c, 2, a, b)) + "; expires=Session; path=/";
  location.href = "https://www.tuenvio.cu/matanzas/captcha.ashx?attempt=1";</script>
  </body>
  </html>

我的请求代码是:

readonly HttpClient Client;
readonly CookieContainer CookieContainer;
    ServicePointManager.SecurityProtocol |= SecurityProtocolType.Tls | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls12;

CookieContainer = new CookieContainer();

HttpClientHandler handler = new HttpClientHandler()
{
    CookieContainer = CookieContainer,
    UseCookies = true,
    SslProtocols = SslProtocols.Tls12 | SslProtocols.Tls11 | SslProtocols.Tls,
    ServerCertificateCustomValidationCallback = (sender, certificate, chain, sslPolicyErrors) => true
};

// Create an HttpClient object
Client = new HttpClient(handler);
Client.DefaultRequestHeaders.Add("Origin", BaseUrl);
Client.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36");
Client.DefaultRequestHeaders.Accept.Add(new System.Net.Http.Headers.MediaTypeWithQualityHeaderValue("*/*"));
Client.DefaultRequestHeaders.Connection.Add("keep-alive");

public async Task<Image> getImagen(string Uri)
{
    try
    {
        var req = new HttpRequestMessage(HttpMethod.Get, Uri);
        req.Headers.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9");
        

    var resp = await Client.SendAsync(req);
    if (resp.IsSuccessStatusCode)
    {
        var bytes = await resp.Content.ReadAsByteArrayAsync();
        var ms = new MemoryStream(bytes);
        return Image.FromStream(ms);
    }
}
catch (Exception ex)
{
    System.Diagnostics.Debug.WriteLine(ex.Message);
}
return null;

图像在网络浏览器中显示成功,但是 HttpClient 我只能得到 html 响应。我怎样才能解决这个问题?

好的,这是我的新方法。我在 C# 中实现了 ToNumbers、ToHex 和 SLowAes.Decript 函数,并使用更新的 Url 并添加 de cookie 再次发出请求。就像由 webbrowser 运行的 javascript 一样。

public async Task<Image> getRecatcha()
{
    try
    {
    string requestUri = BaseUrl + Settings.Tienda + "/captcha.ashx";
    var req = new HttpRequestMessage(HttpMethod.Get, requestUri);

    var resp = await Client.SendAsync(req);
    if (resp.IsSuccessStatusCode)
    {
        var respbody = await resp.Content.ReadAsStringAsync();
        Log(respbody, "_Recaptcha.html");

        var result = GetSecurityCookie(respbody);
            if (result.Success)
            {
                var req2 = new HttpRequestMessage(HttpMethod.Get, result.Url);
                req2.Headers.Add("cookie", result.Cookie);
                var resp2 = await Client.SendAsync(req2);
                if (resp2.IsSuccessStatusCode)
                {
                    var respbody2 = await resp2.Content.ReadAsStringAsync();
                    Log(respbody2, "_Recaptcha_2.html");
                    var bytes = await resp2.Content.ReadAsByteArrayAsync();
                    var ms = new MemoryStream(bytes);
                    return Image.FromStream(ms);
                }

            }
        }

}
    catch (Exception ex)
    {
        System.Diagnostics.Debug.WriteLine(ex.Message);
    }

    return null;
}


public result GetSecurityCookie(string respbody)
{
    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(respbody);
    var item = doc.DocumentNode.Descendants().FirstOrDefault(x => x.Name == "script" && !x.Attributes.Any());
    if (item == null)
        return new result() { Success = false };

    string data = item.InnerHtml;
    var ma = System.Text.RegularExpressions.Regex.Match(data, "a\\s*=\\s*toNumbers\\s*\\(\\s*\\\"(\\w+)\\\"\\s*\\)", System.Text.RegularExpressions.RegexOptions.IgnoreCase);
    var mb = System.Text.RegularExpressions.Regex.Match(data, "b\\s*=\\s*toNumbers\\s*\\(\\s*\\\"(\\w+)\\\"\\s*\\)", System.Text.RegularExpressions.RegexOptions.IgnoreCase);
    var mc = System.Text.RegularExpressions.Regex.Match(data, "c\\s*=\\s*toNumbers\\s*\\(\\s*\\\"(\\w+)\\\"\\s*\\)", System.Text.RegularExpressions.RegexOptions.IgnoreCase);

    var murl = System.Text.RegularExpressions.Regex.Match(data, "location.href\\s *=\\s *\\\"(.+)\\\"", System.Text.RegularExpressions.RegexOptions.IgnoreCase);

    var a = ma.Groups[1].Value;
    var b = mb.Groups[1].Value;
    var c = mc.Groups[1].Value;
    var url = murl.Groups[1].Value;

    var des = Decript(c, a, b);

    //       var Uri = new Uri(BaseUrl);
    //       CookieContainer.Add(Uri, new Cookie("ASP.KLR", des) {Path="/"});
    //string cookie = "ASP.KLR=" + des + "; expires=Session; path=/";
    string cookie = "ASP.KLR=" + des;
    return new result() { Success = true , Url = url , Cookie = cookie };
}

为了简化问题,我没有放 Tonumbers 、 ToHex 和 decript 函数。

但服务器仍然响应相同的页面,只是增加了 url 中的重试次数。

标签: c#htmlhttpclient

解决方案


这里不是 C#、JS 或 HTML 专家,但我注意到的是,如果您在没有 cookie ASP.KLR 的情况下发出请求,服务器会使用该脚本发送该页面以设置该 cookie,然后重定向到https:// www.tuenvio.cu/matanzas/captcha.ashx?attempt=1

因此,为了获得图像,您需要发送服务器期望的 cookie。您应该通过以某种方式运行该代码来计算它,或者在 C# 中实现它并将页面解析为您需要的数据。

然而,据我所见,反应总是一样的。请参阅变量 a、b、c。因此,也许您可​​以只运行该脚本一次并在所有请求中使用计算值(cookie)。其实你可以用浏览器看看值是多少。这至少应该适用于进行一些测试。如果响应更改了用于计算 cookie 的值,那么要使其在每次需要时都能正常工作而无需手动干预,您应该按照我在上一段中所说的那样做。

更新:

以下是如何使用 curl 获取图像的示例:

curl "https://www.tuenvio.cu/carlos3/captcha.ashx" -H "user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36" -H "cookie: ASP.KLR=c27408541b70b97e5003d39a2300ffac" --compressed > captcha.png


推荐阅读