python - 如何针对重定向蛮力“掉头”
问题描述
嘿 StackOverFlow,
我正在尝试使用 python requests 库保存一些图像,但是,我在尝试从中文网站保存图像时遇到了挑战。
我有 3 个示例代码片段来说明我的问题:
- 我保存简单图像的理想模型情况
- 状态码:200。不同的输入和最终 URL。虚拟图像已保存
- 状态码:302。相同的输入和最终 URL。奇怪的图片已保存
功能:
def get_response(url):
print('Input URL:\n\t %s'%(url))
response = requests.get(url)
return response
def get_response_dont_redirect(url):
print('Input URL:\n\t %s'%(url))
response = requests.get(url, allow_redirects=False)
return response
def check_response_status(response):
status = response.status_code
if status == 200:
print(('Final URL:\n\t %s')%response.url)
print('Status Code: %s / OK'%(status))
return 'ok'
if status == 302:
print(('Final URL:\n\t %s')%response.url)
print('Status Code: %s / Redirected'%(status))
return 'redirected'
if status == 404:
print('Status Code: %s / Access Denied'%(status))
return 'denied'
def save_image(response, status_code):
if status_code ==302:
with open('image_wanted.jpg', 'wb') as f:
print('\nSaving image desired under "image_wanted.jpg"...\n')
f.write(response.content)
elif status_code == 200:
with open('image_redirect.jpg', 'wb') as f:
print('\nSaving image redirected under "image_redirect.jpg"...\n')
f.write(response.content)
elif status_code == 111:
with open('image_normal.jpg', 'wb') as f:
print('\nSaving image normal under "image_normal.jpg"...\n')
f.write(response.content)
def case_1_comments():
print('-------------------------------------------------------------------')
print('#Comments:')
print('# This is the ideal situation where I can simply download an image')
print('-------------------------------------------------------------------')
def case_2_comments():
print('-------------------------------------------------------------------')
print('#Comments:')
print('# Notice that despite the status code being 200, the input URL and final URL is different ')
print('\t>I am definitely being redirected')
print('\t>I get a dummy image from the redirected page')
print('-------------------------------------------------------------------')
def case_3_comments():
print('-------------------------------------------------------------------')
print('#Comments:')
print('# Here I have set the restriction of "allow_redirects=False" yet I get status code:302 ')
print('\t>Somehow the input and final URL is the same')
print('\t>The image saved is perpetually loading...')
print('-------------------------------------------------------------------')
案例一:理想案例
print("\n\n--- Case 1: Ideal ---\n")
url = 'https://i5.walmartimages.ca/images/Large/094/514/6000200094514.jpg'
response = get_response(url)
status = check_response_status(response)
save_image(response, 111)
case_1_comments()
案例 2:没有 'allow_redirect=False'
print("\n\n--- Case 2: without 'allow_redirects=False' restriction ---\n")
url = 'http://photo.yupoo.com/evakicks/6b3a8a2a/small.jpg'
response = get_response(url)
status = check_response_status(response)
save_image(response, 200)
case_2_comments()
案例 3:使用 'allow_redirect=False'
print("\n\n--- Case 3: with 'allow_redirects=False' restriction ---\n")
url = 'http://photo.yupoo.com/evakicks/6b3a8a2a/small.jpg'
response = get_response_dont_redirect(url)
status = check_response_status(response)
save_image(response, 302)
case_3_comments()
如果您复制粘贴我的代码并运行它(请参阅下面的这个问题和 pip install requests,如果您还没有的话),您会发现案例 2 和 3 非常奇怪。我想要的目标是强制返回输入 URL 并将图像保存在该页面上。
如案例 3 所示,我已设法返回该页面,但由于某种原因,该图像只是一个加载屏幕。
所以我想我的问题是:
- 我真的反对重定向吗?
- 如何保存我想要的图片而不获取加载图片?
以下是要运行的整个脚本 (请原谅意大利面):
import requests
def get_response(url):
print('Input URL:\n\t %s'%(url))
response = requests.get(url)
return response
def get_response_dont_redirect(url):
print('Input URL:\n\t %s'%(url))
response = requests.get(url, allow_redirects=False)
return response
def check_response_status(response):
status = response.status_code
if status == 200:
print(('Final URL:\n\t %s')%response.url)
print('Status Code: %s / OK'%(status))
return 'ok'
if status == 302:
print(('Final URL:\n\t %s')%response.url)
print('Status Code: %s / Redirected'%(status))
return 'redirected'
if status == 404:
print('Status Code: %s / Access Denied'%(status))
return 'denied'
def save_image(response, status_code):
if status_code ==302:
with open('image_wanted.jpg', 'wb') as f:
print('\nSaving image desired under "image_wanted.jpg"...\n')
f.write(response.content)
elif status_code == 200:
with open('image_redirect.jpg', 'wb') as f:
print('\nSaving image redirected under "image_redirect.jpg"...\n')
f.write(response.content)
elif status_code == 111:
with open('image_normal.jpg', 'wb') as f:
print('\nSaving image normal under "image_normal.jpg"...\n')
f.write(response.content)
def case_1_comments():
print('-------------------------------------------------------------------')
print('#Comments:')
print('# This is the ideal situation where I can simply download an image')
print('-------------------------------------------------------------------')
def case_2_comments():
print('-------------------------------------------------------------------')
print('#Comments:')
print('# Notice that despite the status code being 200, the input URL and final URL is different ')
print('\t>I am definitely being redirected')
print('\t>I get a dummy image from the redirected page')
print('-------------------------------------------------------------------')
def case_3_comments():
print('-------------------------------------------------------------------')
print('#Comments:')
print('# Here I have set the restriction of "allow_redirects=False" yet I get status code:302 ')
print('\t>Somehow the input and final URL is the same')
print('\t>The image saved is perpetually loading...')
print('-------------------------------------------------------------------')
print("\n\n--- Case 1: Standard procedure ---\n")
url = 'https://i5.walmartimages.ca/images/Large/094/514/6000200094514.jpg'
response = get_response(url)
status = check_response_status(response)
save_image(response, 111)
case_1_comments()
print("\n\n--- Case 2: without 'allow_redirects=False' restriction ---\n")
url = 'http://photo.yupoo.com/evakicks/6b3a8a2a/small.jpg'
response = get_response(url)
status = check_response_status(response)
save_image(response, 200)
case_2_comments()
print("\n\n--- Case 3: with 'allow_redirects=False' restriction ---\n")
url = 'http://photo.yupoo.com/evakicks/6b3a8a2a/small.jpg'
response = get_response_dont_redirect(url)
status = check_response_status(response)
save_image(response, 302)
case_3_comments()
解决方案
推荐阅读
- architecture - 手柄的另一边是什么?
- javascript - 通过 $set 更改对象属性给出:“TypeError: Cannot read property 'call' of undefined”
- javascript - 指定 vuetify 网格的宽度
- javascript - 我有一个需要重组为所需格式的对象数组。我尝试使用迭代使用数组解构
- c# - 为什么 Serilog 在达到文件大小限制之前滚动文件?
- c++ - QuickFix C++:编译 Acceptor 时出错
- docker - 使用 Docker 容器托管非营利应用程序
- sql - 如果条目被多次完成,则需要在每种情况下获取第二条记录
- swift - 发布者发布操作进度和最终值
- gitlab - gitlab表格中文本的中心对齐