python - 从 CSV 文件中的 ULS 下载照片 – urllib.error.HTTPError: HTTP Error 403: Forbidden
问题描述
我下面的脚本应该从 URL 列表中下载一堆图像,但它不断HTTP Error 403: Forbidden
从raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden
.
不知道该怎么办。您可以自己运行它,我在下面提供了所有内容。
任何帮助将不胜感激(:
目标是从 CSV 中的 URL 列表中下载一堆图像,而不会出现错误 403
from bs4 import BeautifulSoup
from time import sleep
import urllib.request
import requests
import praw
import csv
r = praw.Reddit(client_id=client_id,
client_secret=client_secret,
user_agent=user_agent,
username=username,
password=password)
subred = r.subreddit("partyparrot")
top = subred.top(limit = 780)
type(top)
x = next(top)
dir(x)
with open("output_reddit.csv", 'r') as csvfile:
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive',
'Access-Control-Allow-Origin': '*',
'Access-Control-Allow-Methods': 'GET',
'Access-Control-Allow-Headers': 'Content-Type',
'Access-Control-Max-Age': '3600',
'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0'
}
for line in csvfile:
splitted_line = line.split('||')
if splitted_line[2] != '' and splitted_line[2] != "\n" and ".png" in splitted_line[2]:
urllib.request.urlretrieve(splitted_line[2], filename=("img_" + splitted_line[0] + ".png"))
print ("Image saved for {0}".format(splitted_line[0]))
elif splitted_line[2] != '' and splitted_line[2] != "\n" and ".jpg" in splitted_line[2]:
urllib.request.urlretrieve(splitted_line[2], filename=("img_" + splitted_line[0] + ".jpg"))
print ("Image saved for {0}".format(splitted_line[0]))
elif splitted_line[2] != '' and splitted_line[2] != "\n" and "v.redd.it" in splitted_line[2]:
urllib.request.urlretrieve(splitted_line[2].rstrip() + "/DASH_720.mp4", filename=("img_" + splitted_line[0] + ".mp4"))
print ("Image saved for {0}".format(splitted_line[0]))
else:
print ("No result for {0}".format(splitted_line[0]))
以下是output_reddit.csv
供参考的文件。
2||I tried the no pet challenge... she wasn't having it||https://v.redd.it/da60x1qizgs51
3||My trip to the salon went horribly wrong.||https://v.redd.it/tfzc1vye6ds51
4||A few sketches of my macaw buddy from work. Haven't seen this silly girl in six months due to quarantine, I miss her.||https://i.redd.it/jjkb3b5ntis51.jpg
5||Thermals of the party girl!||https://i.imgur.com/rfGChUQ.jpg
6||I present you with Lorena. After rescue, I found out shes an old bird and mostly blind. Once allowed out of her cage to roam free and was given plenty of wonderful fruits and veggies, she became very warm and cuddly. Shes a very sweet regal lady and definitely a queen.||https://v.redd.it/saojuaycnds51
7||A day in the life of the OG Party Parrot. Credit: Ranger Sarah Little.||https://i.redd.it/wjwvl3u01js51.jpg
8||Party game||https://v.redd.it/8myoampepgs51
9||Here I present to you the Christmas loving partyparrot named Felix. He loved sitting in the tree but never chewed on it. Now rest in peace, little friend we will allways love and remember you.||https://i.redd.it/wengcned7is51.jpg
以下是完整的日志以供参考。
Matts-MacBook-Pro-5:Download matt$ python run.py
Image saved for 2
Image saved for 3
Image saved for 4
Image saved for 5
Traceback (most recent call last):
File "run.py", line 107, in <module>
urllib.request.urlretrieve(splitted_line[2].rstrip() + "/DASH_720.mp4", filename=("img_" + splitted_line[0] + ".mp4"))
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 247, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
解决方案
推荐阅读
- matlab - 仅当该时间序列超过另一个时间序列时才对时间序列使用 findpeaks 函数
- python - 如何在 SqlAlchemy 查询中跳过记录?
- rabbitmq - Receive message when consumer is up in spring-cloud-stream app
- python-3.x - NameError:名称“lname”未定义
- c# - 'The LINQ expression node type 'Invoke' is not supported in LINQ to Entities' when lambda is passed as a parameter, but not when used directly
- java - 使用后初始化数组
- c++ - C++ How to compare classes
- go - Where are go tools stored?
- c# - 如何使用 dnlib 添加对 MessageBox.Show() 的调用?
- pdf-generation - High resolution scatterplot in GNUPlot?