首页 > 解决方案 > libcurl 无法正确下载图像文件

问题描述

我已经创建了这个非常基本的 curl 包装器,并且可以使用它下载 html 页面,但是我遇到的问题是当我尝试获取图像时(没有尝试过其他文件)。

class BasicCurlWrapper
{
    CURL* m_curlHandle{ nullptr };
    std::string m_current_url{};
    std::string m_destinationFilePath{};
    std::ofstream m_outputFile{};
    std::ios_base::openmode m_fileOpenMode{ std::ios::out };
    bool m_verbose{ false };

public:
    BasicCurlWrapper()
    {
        m_curlHandle = curl_easy_init();
    }

    ~BasicCurlWrapper()
    {
        curl_easy_cleanup(m_curlHandle);
        //curl_global_cleanup();
    }

    void downloadUrl(const std::string& url, const std::string& destination, std::ios_base::openmode openmode = std::ios::out) 
    {
        if (m_outputFile.is_open()) {
            m_outputFile.close();
        }

        m_current_url = url;
        m_destinationFilePath = destination;
        m_fileOpenMode = openmode;
        char errbuf[CURL_ERROR_SIZE] = { 0 };

        curl_easy_setopt(m_curlHandle, CURLOPT_URL, url.data());        
        curl_easy_setopt(m_curlHandle, CURLOPT_VERBOSE, m_verbose ? 1L : 0L); //Switch on full protocol/debug output while testing        
        curl_easy_setopt(m_curlHandle, CURLOPT_NOPROGRESS, 1L); //disable progress meter, set to 0L to enable it
        curl_easy_setopt(m_curlHandle, CURLOPT_FOLLOWLOCATION, 1L);
        curl_easy_setopt(m_curlHandle, CURLOPT_USERAGENT, "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.89 Safari/537.36");
        curl_easy_setopt(m_curlHandle, CURLOPT_WRITEFUNCTION, BasicCurlWrapper::write_data);
        curl_easy_setopt(m_curlHandle, CURLOPT_WRITEDATA, this);
        curl_easy_setopt(m_curlHandle, CURLOPT_FAILONERROR, 1L);
        curl_easy_setopt(m_curlHandle, CURLOPT_ERRORBUFFER, errbuf);
        //curl_easy_setopt(m_curlHandle, CURLOPT_ACCEPT_ENCODING, "");
        //curl_easy_setopt(m_curlHandle, CURLOPT_SSLCERT, "C:/msys64/mingw64/ssl/certs/ca-bundle.crt");

        auto res = curl_easy_perform(m_curlHandle);

        if (m_outputFile.is_open()) {
            m_outputFile.close();
        }

        if (res == CURLE_OK) {
            std::cout << "Downloaded file\n";
        } else {
            std::cout << "ERROR: " << curl_easy_strerror(res) << '\n' << errbuf << '\n';
        }
    }


    void setVerbose(bool cond)
    {
        m_verbose = cond;
    }

    //https://curl.haxx.se/mail/lib-2008-09/0250.html
    static std::size_t write_data(const char* ptr, const std::size_t size, const std::size_t nmemb, void* classIntance)
    {

        if (nmemb > 0) {
            static_cast<BasicCurlWrapper*>(classIntance)->writeToFile(ptr, nmemb);
        }
        return nmemb;
    }

private:

    void writeToFile(const char* ptr, const std::size_t nmemb)
    {
        if (!m_outputFile.is_open()) {
            m_outputFile.open(m_destinationFilePath, m_fileOpenMode);
        }        

        if (m_outputFile.is_open()) {
            std::cout << "Writing data amount: " << nmemb << '\n';
            m_outputFile.write(ptr, nmemb);
        } else {
            auto errorMsg{ std::string{"Unable to open file: " + m_destinationFilePath } };
            throw std::runtime_error{ errorMsg };
        }
    }
};

所以我这样使用它:

 BasicCurlWrapper cr;
 cr.setVerbose(true);
 cr.downloadUrl("https://icons.iconarchive.com/icons/google/noto-emoji-activities/512/52730-soccer-ball-icon.png", "ball.png", std::ios::out | std::ios::binary);

这确实下载了一些东西:

‰PNG

¾M&S»Á€&gt;öÝÀKþ駟ªC²²²Ð½{wÕ5–-[†…*7Þx½zõ¢C˜ž––L›6
555ŠÛŽ1þ³ºÂr­­­'­Å·Íê>ð^ùpAmèÀŽãœ.—«–@èEÀŒ±yJÛ)©éâàÔóÚÄ™ÄA]]¦NŠ¦æfÅ÷uÍ5Tò—+Ö­[‡¾òŠªúÕ×^CvŸ>gtò'­É·ý›œü¹QYñÇÝér¹þmöçpÁð^¯w€AJÛFâR€–tîܹ=Ï cä`íÚµX»v­âëÙív,X°€ªþa…$I¸ë®»T•¾ðÂqß}÷µÏàÛÖä:„ŠŠ
Šbª$€Ðÿ.

虽然它以 PNG 开头,但这不是一个有效的 png,原始文件也是 39kb。我是否必须发送一些额外的标题或其他东西?我希望能够下载任何指定的文件。

我曾经vcpkg得到 libcurl:

curl:x64-windows                                   7.68.0

编辑:

我已经更新了代码以反映我现在write用来将数据输出到文件的@Some程序员老兄的答案。这已经修复了我使用的示例图像。

我现在遇到的问题是我正在尝试下载的另一个图像。

cr.downloadUrl("https://v217.mangabeast.com/manga/Onepunch-Man/0130-007.png", "image.png", std::ios::out | std::ios::binary);

该文件image.png现在包含以下文本:

error code: 1010

我只需使用以下命令即可下载此图像:

curl -O <url>

所以我没有通过 curl 命令传递任何东西,所以我需要在 libcurl 中传递什么?

这是请求的输出:

 * STATE: INIT => CONNECT handle 0x24781b66728; line 1605 (connection #-5000)
 * Added connection 0. The cache now contains 1 members
 * STATE: CONNECT => WAITRESOLVE handle 0x24781b66728; line 1646 (connection #0)
 *   Trying 104.31.15.158:443...
 * TCP_NODELAY set
 * STATE: WAITRESOLVE => WAITCONNECT handle 0x24781b66728; line 1725 (connection #0)
 * Connected to v217.mangabeast.com (104.31.15.158) port 443 (#0)
 * STATE: WAITCONNECT => SENDPROTOCONNECT handle 0x24781b66728; line 1781 (connection #0)
 * Marked for [keep alive]: HTTP default
 * schannel: SSL/TLS connection with v217.mangabeast.com port 443 (step 1/3)
 * schannel: checking server certificate revocation
 * schannel: sending initial handshake data: sending 184 bytes...
 * schannel: sent initial handshake data: sent 184 bytes
 * schannel: SSL/TLS connection with v217.mangabeast.com port 443 (step 2/3)
 * schannel: failed to receive handshake, need more data
 * STATE: SENDPROTOCONNECT => PROTOCONNECT handle 0x24781b66728; line 1796 (connection #0)
 * schannel: SSL/TLS connection with v217.mangabeast.com port 443 (step 2/3)
 * schannel: encrypted data got 2709
 * schannel: encrypted data buffer: offset 2709 length 4096
 * schannel: sending next handshake data: sending 93 bytes...
 * schannel: SSL/TLS connection with v217.mangabeast.com port 443 (step 2/3)
 * schannel: encrypted data got 258
 * schannel: encrypted data buffer: offset 258 length 4096
 * schannel: SSL/TLS handshake complete
 * schannel: SSL/TLS connection with v217.mangabeast.com port 443 (step 3/3)
 * schannel: stored credential handle in session cache
 * STATE: PROTOCONNECT => DO handle 0x24781b66728; line 1815 (connection #0)
> GET /manga/Onepunch-Man/0130-007.png HTTP/1.1
Host: v217.mangabeast.com
User-Agent: User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.89 Safari/537.36
Accept: */*

 * STATE: DO => DO_DONE handle 0x24781b66728; line 1870 (connection #0)
 * STATE: DO_DONE => PERFORM handle 0x24781b66728; line 1991 (connection #0)
 * schannel: client wants to read 16384 bytes
 * schannel: encdata_buffer resized 17408
 * schannel: encrypted data buffer: offset 0 length 17408
 * schannel: encrypted data got 674
 * schannel: encrypted data buffer: offset 674 length 17408
 * schannel: decrypted data length: 611
 * schannel: decrypted data added: 611
 * schannel: decrypted cached: offset 611 length 16384
 * schannel: encrypted data length: 34
 * schannel: encrypted cached: offset 34 length 17408
 * schannel: decrypted data length: 5

编辑2:

我现在添加了一些错误检查以及错误失败。我得到以下信息:

ERROR: HTTP response code said error
The requested URL returned error: 403 Forbidden

我不明白如何403通过命令行使用 cURL 获得图像。

编辑 3:

刚刚注意到用户代理字符串有User-Agent:,在放入一个有效的用户代理后,我得到了文件!

标签: c++httplibcurl

解决方案


您有两个问题,都源于您将收到的数据视为文本。

第一个问题是您以文本模式打开文件,这可能意味着某些字节被转换为其他字节(甚至是多个其他字节)。最常见的此类翻译是换行符'\n',在 Windows 上通常会被翻译为两个字符序列'\r''\n'.

第二个问题是您的writeToFile函数假定数据是一个以空字符结尾的字符串,但事实并非如此。用于字符串的空终止符只是一个带有 value 的字节0。任意二进制数据(如 PNG 图像)将包含零字节。您需要使用该write函数写入数据,将数据的实际长度(以字节为单位)传递size给 cURL“写入数据”函数回调的参数。

要解决您的第一个问题,您需要通过在打开文件std::ios::bin时添加标志以二进制模式打开文件。第二个问题可以通过使用write前面提到的函数来解决。


推荐阅读