首页 > 解决方案 > 使用 VBA-Excel 从受保护的网站下载文件

问题描述

我正在尝试从受保护的网页下载文件(来自我的工作,所以我无法发布网址)。

当我在 .xls 中保存字节(我正在使用 WinHTTP 请求)时,文件似乎已损坏。

我的代码(没有敏感信息)和“HTTP Header Live”采用的一些代码。

option explicit

Sub SaveFileFromURL()

Dim FileNum As Long
Dim FileData() As Byte
Dim WHTTP As Object
Dim SetCookieString As String
Dim mainUrl As String
Dim fileUrl As String
Dim SavePath As String
Dim strAuthenticate As String

'this is the url to login, extracted with HTTP Header Live.
'however, the url shown in the browser is: "https://www.website.com/retro/default.asp?idioma=ES"
mainUrl = "https://www.website.com/retro/logincheck.asp"

'this is the url for to download the file, but need to send a cookie as credentials
'(if you login via browser and paste the link, it will popup the saveas dialog, but if you don't login, it will return an internal error)
fileUrl = "https://www.website.com/retro/VerBordero.asp?id=27348&p=3º Trimestre 2019&n=0&m=UNKNOWN&con=CIRCULAR&fmt=xls"

'the path where to save the file with the extension I know it will have.
SavePath = "C:\Users\Victor\Desktop\bordero.xls"

'authetication should be:    
strAuthenticate = "txtUser=MYUSER&txtpwd=MYPASS&lg=es"

'I login to catch the cookie that it is suppose to allow me to download the file.
Set WHTTP = CreateObject("WinHTTP.WinHTTPrequest.5.1")
WHTTP.Open "POST", mainUrl, False
WHTTP.Send strAuthenticate

'if it is correct:
If WHTTP.Status = 200 Then

    'I get the cookie
    SetCookieString = WHTTP.getResponseHeader("Set-Cookie")

    'Then you have to GET direct file url
    WHTTP.Open "GET", fileUrl, False
    WHTTP.setRequestHeader "Content-Type", "application/x-msexcel"
    WHTTP.setRequestHeader "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
    WHTTP.setRequestHeader "Accept-Language", "es-ES,es;q=0.8,en-US;q=0.5,en;q=0.3"
    WHTTP.setRequestHeader "Cookie", SetCookieString
    WHTTP.Send

    'if i get the file
    If WHTTP.Status = 200 Then

        'download bytes
        FileData = WHTTP.responseBody
        Set WHTTP = Nothing

        'Save the bytes into file
        FileNum = FreeFile
        Open SavePath For Binary Access Write As #FileNum
            Put #FileNum, 1, FileData
        Close #FileNum

    End If

End If

End Sub

代码运行。我在所有检查状态下都得到“OK”,但是当我尝试打开文件(有 622kb,与手动下载相同)时,Excel 会提示我一条消息,指出单元格的数据过多。如果我点击“仍然打开”,我可以阅读一些部分。

看起来文件正在以不同的编码保存。

来自 HTTP Header Live 的一些代码。

登录:
https
://www.website.com/retro/logincheck.asp 主机:www.website.com
用户代理:Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9, / ;q=0.8
Accept-Language: es-ES,es;q=0.8,en-US;q=0.5,en;q =0.3
Accept-Encoding: gzip, deflate, br
Content-Type: application/x-www-form-urlencoded
Content-Length: 38
Origin: https://www.website.com
Connection: keep-alive
Referer: https:// /www.website.com/retro/default.asp?idioma=ES
Cookie:cc_social=yes;cc_analytics=是;cc_advertising=是;cc_necessary=是;_ga=GA1.2.859443177.1570897852; __utma=136336428.859443177.1570897852.1570897852.1570897852.1;__utmz=136336428.1570897852.1.1.utmcsr=google|utmccn=(有机)|utmcmd=有机| utmctr=(未提供%20); ASPSESSIONIDQWSBSQAD=LDIIIENCGKDHMEJEGKJKBDMM; cookieconsent=解雇;ASPSESSIONIDQWQCRRBD=PCOIENHDNLHCKNODMJLKAFGM
Upgrade-Insecure-Requests: 1
txtUser=MYUSER&txtpwd=MYPASS&lg=es
POST: HTTP/2.0 302 发现
日期: Wed, 04 Dec 2019 22:20:17 GMT
server: Microsoft-IIS/8.5
cache-control: private
content -type: text/html
expires: Wed, 04 Dec 2019 22:20:18 GMT
位置:retro.asp
内容长度:130
via:2.0 www.website.com
X-Firefox-Spdy:h2

文件:
https: //www.website.com/retro/VerBorderoGRxls.asp? id=27348&p=3º Trimestre 2019&n=0&m=UNKNOWN&con=CIRCULAR&fmt=xls
主机:www.website.com
用户代理:Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9, / ;q=0.8
Accept-Language: es-ES,es ;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Referer: https://www.website.com/retro/borderos_resumen.asp
Cookie:cc_social=yes;cc_analytics=是;cc_advertising=是;cc_necessary=是;_ga=GA1.2.859443177.1570897852; __utma=136336428.859443177.1570897852.1570897852.1570897852.1;__utmz=136336428.1570897852.1.1.utmcsr=google|utmccn=(有机)|utmcmd=有机| utmctr=(未提供%20); ASPSESSIONIDQWSBSQAD=LDIIIENCGKDHMEJEGKJKBDMM; cookieconsent=解雇;ASPSESSIONIDQWQCRRBD=PCOIENHDNLHCKNODMJLKAFGM
升级-不安全-请求:1

GET:HTTP/2.0 200 OK
日期:2019 年 12 月 4 日星期三 22:21:40 GMT
服务器:Microsoft-IIS/8.5
缓存控制:私有
内容类型:应用程序/x-msexcel
过期:2019 年 12 月 4 日,星期三 22:格林威治标准时间 21:40
内容长度:637440
通过:2.0 www.website.com
X-Firefox-Spdy:h2

标签: excelvbawinhttprequestdownloading-website-filesprotected-resource

解决方案


推荐阅读