首页 > 解决方案 > python-requests 无法下载图像,下载的图像为 0 字节

问题描述

我正在尝试下载新闻电子纸(电子纸是图像)。我正在使用 selenium 登录并获取图像 src 并请求模块下载图像。

这是我使用的代码(请求部分):

def download(driver,pageNumber):
    page,filename = pageNumber,""
    if page in range(1,10):
        filename = str(currentDT) + "_kompas_{}"+str(page)+".jpg"
        filename = filename.format(0)
    else: filename = str(currentDT) + "_kompas_"+str(page)+".jpg"
    print("Downloading Page " + str(pageNumber) + " ...")
    div = driver.find_element_by_xpath("//div[@class='page-wrapper' and  @page='" + str(pageNumber) + "']")
    img = div.find_element_by_tag_name("img")
    imgsrc = img.get_attribute("src")
    imgsrc2 = imgsrc.replace("getmedium","getpreview")
    img.click()
    WebDriverWait(driver,200).until(EC.visibility_of_element_located((By.XPATH,"//img[@src = '"+imgsrc2+"']")))
    div2 = driver.find_element_by_xpath("//div[@class='page-wrapper' and @page='" + str(pageNumber) + "']")
    img2 = div2.find_element_by_tag_name("img")
    url = img2.get_attribute("src")
    url = url.replace("https","http")
    print(url)
    url = img2.get_attribute("src")
    r = requests.get(url)
    if r.status_code == 200:
        with open(download_path + "1.jpg", 'wb') as f:
            f.write(r.content)

运行代码后,下载图像的大小为 0 字节。当我使用检查标题时print(r.headers),它会抛出如下内容:

{'Date': 'Fri, 28 Sep 2018 06:14:29 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Set-Cookie': '__cfduid=d2770acf5454bb72630a1936eda1930561538115268; expires=星期六,19 年 9 月 28 日 06:14:28 GMT;路径=/; 域=.epaper.id;HttpOnly,ci_session=db77e070cbe346e0ac183d686efae9989e8f2096;路径=/; HttpOnly', 'X-Powered-By': 'PHP/5.6.37', 'Expires': 'Thu, 19 Nov 1981 08:52:00 GMT', 'Cache-Control': 'no-store, no-缓存,必须重新验证,后检查=0,预检查=0','Pragma':'no-cache','Expect-CT':'max-age=604800,report-uri="https:/ /report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"', '服务器': 'cloudflare', '

我应该怎么做才能解决这个问题?请帮我...

标签: python-3.xweb-scrapingpython-requests

解决方案


推荐阅读