首页 > 解决方案 > 如何使用 requests() 和 open() 下载所有二进制文件(图像)?

问题描述

当我尝试从一个 URL 下载图像时,代码有效,但是当我尝试另一个 URL 时,它不起作用。这不起作用。它只创建文件名。

# This Doesn't Work.
import requests
url = 'https://ryanspressurewashing.com/wp-content/uploads/2017/06/metal-
roof-after-pressure-washing.jpg'

r = requests.get(url, stream=True)
with open('image3.jpg', 'wb') as my_file:
# Read by 4KB chunks
    for byte_chunk in r.iter_content(chunk_size=4096):
        my_file.write(byte_chunk)



#  This Works?

import requests
url = 'http://www.webscrapingfordatascience.com/files/kitten.jpg'
r = requests.get(url, stream=True)
with open('image.jpg', 'wb') as my_file:
# Read by 4KB chunks
    for byte_chunk in r.iter_content(chunk_size=4096):
        my_file.write(byte_chunk)

标签: python-3.ximage-processingweb-scrapingdownloadpython-requests

解决方案


不同的门户可能有不同的安全系统来阻止脚本/机器人。

当您image3.jpg在文本编辑器中打开时,您会看到

<head>
<title>Not Acceptable!</title>
</head>
<body>
<h1>Not Acceptable!</h1>
<p>An appropriate representation of the requested resource could not be found on  this server. 
This error was generated by Mod_Security.</p>
</body>
</html>

某些服务器可能需要正确headerse的 , cookies,session-id等才能访问数据。

此门户需要正确的标题user-agent

import requests

url = 'https://ryanspressurewashing.com/wp-content/uploads/2017/06/metal-roof-after-pressure-washing.jpg'

headers = {
  'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0'
}

r = requests.get(url, stream=True, headers=headers)

with open('image3.jpg', 'wb') as my_file:
# Read by 4KB chunks
    for byte_chunk in r.iter_content(chunk_size=4096):
        my_file.write(byte_chunk)

requests作为默认用途user-agent: python-requests/2.21.0,门户网站可以轻松识别脚本并阻止它。

您可以使用https://httpbin.org/get

import requests

r = requests.get('https://httpbin.org/get')
print(r.text)

结果:

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.21.0"
  }, 
  "origin": "83.23.39.165, 83.23.39.165", 
  "url": "https://httpbin.org/get"
}

在httpbin.org上查看更多功能


推荐阅读