首页 > 解决方案 > 如何使用 WinHTTPRequest.5.1 从安全网站下载文件

问题描述

我正在尝试从带有 VBA 的网站静默下载文件(PDF)。到目前为止,我在初始屏幕上输入用户名和密码没有问题,导航到站点内的报告页面,在表格中成功获取我的文件列表。我毫无问题地获得了相关文件的 URL。这是我撞墙的地方。我确实下载了一个文件,但是当我打开它时收到一个安全警告,我必须登录才能查看它。当我未登录时,我可以通过将 URL 粘贴到任何浏览器来模拟此警告,它们看起来相同。所以我正在下载但不进行身份验证。

下载问题上的代码:

Dim strCookie As String
Dim strResponse As String
Dim xobj As Object
Dim WinHttpReq As Object
Dim WinHttpReq2 As Object
Dim oStream As Object

' Set xobj = New WinHttp.WinHttpRequest
strDocLink = "https://atlasbridge.com" & strDocLink & "&RT=PREVMAIL"
Debug.Print strDocLink
' launch tab & goto url/doc
' try to download the link(this is the url of the file)
' strDocLink
Set WinHttpReq = CreateObject("WINHTTP.WinHTTPRequest.5.1")
strUrl = "https://atlasbridge.com/search/AgencyReports.aspx"
WinHttpReq.Open "GET", strUrl, False
WinHttpReq.Option(WinHttpRequestOption_EnableRedirects) = False
WinHttpReq.setRequestHeader "Referer", "https://atlasbridge.com/search/AgencyReports.aspx"
WinHttpReq.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"
WinHttpReq.setRequestHeader "Connection", "keep-alive"
WinHttpReq.setRequestHeader "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
WinHttpReq.setRequestHeader "Accept-Language", "en-US,en;q=0.5"
WinHttpReq.Send
If WinHttpReq.Status = 200 Then
    strResponse = WinHttpReq.responseText
    Debug.Print strResponse
    strCookie = WinHttpReq.getResponseHeader("Set-Cookie") ' this only gets the cookie; cookie seems include the session id
    resp = WinHttpReq.getAllResponseHeaders
    ' resp = WinHttpReq.responseBody
    ' strCookie = WinHttpReq.getResponseHeader("Cookie") ' doesnt find the requested header
    Debug.Print strCookie
    Debug.Print resp
    End If
' then open second session & try to get document
Set WinHttpReq2 = CreateObject("WINHTTP.WinHTTPRequest.5.1")
WinHttpReq2.Open "GET", strDocLink, False
WinHttpReq2.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"
WinHttpReq2.setRequestHeader "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
WinHttpReq2.setRequestHeader "Accept-Language", "en-US,en;q=0.5"
WinHttpReq2.setRequestHeader "Referer", "https://atlasbridge.com/search/AgencyReports.aspx"
WinHttpReq2.setRequestHeader "Connection", "keep-alive"
WinHttpReq2.setRequestHeader "Host", "atlasbridge.com:443" '
WinHttpReq2.setRequestHeader "Accept-Encoding", "gzip, deflate, br"
' WinHttpReq2.setRequestHeader "Transfer-Encoding", "chunked"
' doesnt like this one causes error on the .send
WinHttpReq2.setRequestHeader "Cache-Control", "private"
WinHttpReq2.setRequestHeader "Upgrade-Insecure-Requests", "1"
WinHttpReq2.setRequestHeader "Content-Type", "application/pdf"
WinHttpReq2.setRequestHeader "Cookie", strCookie
WinHttpReq2.Send
If WinHttpReq2.Status = 200 Then
    Set oStream = CreateObject("ADODB.Stream")
    oStream.Open
    oStream.Type = 1
    oStream.Write WinHttpReq2.responseBody
    oStream.SaveToFile "C:\Users\MyUserName\Desktop\DownloadedMail\atlasreportdownload.ashx.pdf", 1 ' 1 = no overwrite, 2 = overwrite
    oStream.Close
End If

我尝试了一些不同的方法,但我不相信我得到了完整的 cookie 和会话 ID。

我返回的 cookieWinHttpReq.getResponseHeader("Set-Cookie")getAllResponseHeaders看起来像:

NSC_bumbtcsjehf.dpn_TTM_443_MCWT=ffffffffc3a00a0a000000000005e445a4a423660;版本=1;最大年龄=2400;路径=/;安全;httponly

但是当我在 Firefox 中使用 LiveHeaders 时,我看到:

Cookie:ASP.NET_SessionId=z2e4adilfjgiyynx2mntnh1k;NSC_bumbtcsjehf.dpn_TTM_443_MCWT=ffffffffc3a00a0a000000000005e445a4a423660;AuthToken=0be22946-a97a-442e-bd93-c80f0c96a525;AtlasLastMessage=1173;lc_sso7549731=1546651094987;__lc.visitor_id.7549731=S1546651090.26728e19e6

Debug.Print但是当我响应时,我似乎无法使用 AuthToken 和会话 ID 等公开完整的 cookie 。有人可以指出我正确的方向,以便我可以测试我正在做的事情的变化吗?先感谢您。

更新:第一个请求的响应标头:

 Cache-Control: private
 Date: Wed, 16 Jan 2019 22:04:54 GMT
 Content-Length: 164
 Content-Type: text/html; charset=utf-8
 Location: /default.aspx?err=Expired&dest=%2fhome.aspx
 Server: Microsoft-IIS/7.0
 Set-Cookie: ASP.NET_SessionId=mo0owzztbul5of0litxox5kx; path=/; secure; HttpOnly
 Set-Cookie: NSC_bumbtcsjehf.dpn_TTM_443_MCWT=ffffffffc3a00a1a45525d5f4f58455e445a4a423660;Version=1;Max-Age=2400;path=/;secure;httponly
 X-AspNet-Version: 4.0.30319
 X-UA-Compatible: IE=edge
 X-Powered-By: ASP.NET

我现在正在下载响应正文。

标签: htmlvbaweb-scrapingwinhttp

解决方案


推荐阅读