首页 > 解决方案 > Python webscraping error 403 + urlopen decode problem

问题描述

I'm currently trying to extract website informations for the frist time so i'm scrupulously following a tutotial. I started my code as follow :

import requests
from bs4 import BeautifulSoup


url = 'https://www.unep.org/resources?f[0]=category%3A451&f[1]=category%3A452&f[2]=category%3A453&f[3]=category%3A454&f[4]=category%3A455&f[5]=type%3A55&keywords=&'

response = requests.get(url)

But when I've ran my code I had an error 403 (if I understood well it's because of website security)

Then I tried as solution the following code :

req = Request('https://www.unep.org/resources?f[0]=category%3A451&f[1]=category%3A452&f[2]=category%3A453&f[3]=category%3A454&f[4]=category%3A455&f[5]=type%3A55&keywords=&', headers={'User-Agent': 'Mozilla/5.0'})
rep = urlopen(req).read()

but when I continue my tutorial with the following code :

if response.ok:
print(response.text)

I had an error telling : AttributeError: 'bytes' object has no attribute 'ok' and AttributeError: 'bytes' object has no attribute 'text'

Then I tried "decode" :

response = rep.decode('utf-8')

But I got : AttributeError: 'str' object has no attribute 'ok' and AttributeError: 'str' object has no attribute 'text'

I'm a little lost, is there any way to solve this problem by obtaining the same result that my tutorial's code provide ?

标签: pythonweb-scraping

解决方案


Think there is probably nothing wrong, maybe you are requesting to "aggressive". Could not reproduce the behavior - Following code gets all the headlines from that url.

Example

import requests
from bs4 import BeautifulSoup
headers = {"user-agent": "Mozilla/5.0"}
url = 'https://www.unep.org/resources?f[0]=category%3A451&f[1]=category%3A452&f[2]=category%3A453&f[3]=category%3A454&f[4]=category%3A455&f[5]=type%3A55&keywords=&'
r = requests.get(url,headers=headers)

print(*[h.get_text() for h in BeautifulSoup(r.text).select('h5')],sep='\n')

Output

Serving up sustainable food Can coral reef restoration save one of the most vulnerable ecosystems to climate change? Remembrance Forests in Brazil: 200,000 trees for 200,000 COVID-19 victims State of Planet Podcast Implementation in nature-based solutions Finance for adaptation Planning for Adaptation As climate change hits harder, world must increase efforts to adapt Good news for Africa’s Great Green Wall Five things to know about desalination


推荐阅读