python - Python webscraping error 403 + urlopen decode problem
问题描述
I'm currently trying to extract website informations for the frist time so i'm scrupulously following a tutotial. I started my code as follow :
import requests
from bs4 import BeautifulSoup
url = 'https://www.unep.org/resources?f[0]=category%3A451&f[1]=category%3A452&f[2]=category%3A453&f[3]=category%3A454&f[4]=category%3A455&f[5]=type%3A55&keywords=&'
response = requests.get(url)
But when I've ran my code I had an error 403 (if I understood well it's because of website security)
Then I tried as solution the following code :
req = Request('https://www.unep.org/resources?f[0]=category%3A451&f[1]=category%3A452&f[2]=category%3A453&f[3]=category%3A454&f[4]=category%3A455&f[5]=type%3A55&keywords=&', headers={'User-Agent': 'Mozilla/5.0'})
rep = urlopen(req).read()
but when I continue my tutorial with the following code :
if response.ok:
print(response.text)
I had an error telling : AttributeError: 'bytes' object has no attribute 'ok' and AttributeError: 'bytes' object has no attribute 'text'
Then I tried "decode" :
response = rep.decode('utf-8')
But I got : AttributeError: 'str' object has no attribute 'ok' and AttributeError: 'str' object has no attribute 'text'
I'm a little lost, is there any way to solve this problem by obtaining the same result that my tutorial's code provide ?
解决方案
Think there is probably nothing wrong, maybe you are requesting to "aggressive". Could not reproduce the behavior - Following code gets all the headlines from that url.
Example
import requests
from bs4 import BeautifulSoup
headers = {"user-agent": "Mozilla/5.0"}
url = 'https://www.unep.org/resources?f[0]=category%3A451&f[1]=category%3A452&f[2]=category%3A453&f[3]=category%3A454&f[4]=category%3A455&f[5]=type%3A55&keywords=&'
r = requests.get(url,headers=headers)
print(*[h.get_text() for h in BeautifulSoup(r.text).select('h5')],sep='\n')
Output
Serving up sustainable food Can coral reef restoration save one of the most vulnerable ecosystems to climate change? Remembrance Forests in Brazil: 200,000 trees for 200,000 COVID-19 victims State of Planet Podcast Implementation in nature-based solutions Finance for adaptation Planning for Adaptation As climate change hits harder, world must increase efforts to adapt Good news for Africa’s Great Green Wall Five things to know about desalination
推荐阅读
- node.js - NPM 安装引导程序 4 无法使用 JS 文件
- python - 如何将 tkinter 标签小部件中的数据转换为浮点数以实现预测功能
- c# - 属性和参考
- python - Python电子邮件以纯文本形式发送HTML
- shell - 在awk中的变量中定向输出文件名
- reactjs - 无法使用 react-router-dom 直接访问页面
- javascript - 如何创建一个返回字符串第二个字母的方法?
- sql - 如何使用 Microsoft SQL CONTAINS 搜索列中任意位置包含单词的行,类似于 LIKE '%word%'?
- netsuite - NetSuite 或 SuiteScript 按项目筛选子列表中的供应商
- java - VSCode maven proj包org.springframework.boot.SpringApplication不存在