python-3.x - BeautifulSoup 请求或请求?
问题描述
当我使用 BeautifulSoup 请求时,我遇到了问题:
page = urlopen(url).read().decode('utf8')
soup = BeautifulSoup(page)
text = ' '.join(map(lambda p: p.text, soup.find_all('p')))
return soup.title.text, text
我得到了这样一个漂亮的输出:
Coronavirus: Johnson sets out 'ambitious' economic recovery plan - BBC News
* Share this with Email Facebook Messenger Messenger Twitter Pinterest WhatsApp LinkedIn Copy this link These are external links and will open in a new window Boris Johnson has said now is the time to be "ambitious" about the UK's future, as he set out a post-coronavirus recovery plan.
* Infrastructure projects in England would be "accelerated" and there would be investment in new academy schools, green buses and new broadband, the PM added.
但是当我使用 BeautifulSoup 请求时:
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
feed = BeautifulSoup(soup.decode('utf8'))
text = ' '.join(map(lambda p: p.text, feed.find_all('p')))
return soup.title.text, text
我得到一个像这样丑陋的输出:
Coronavirus: Johnson sets out 'ambitious' economic recovery plan - BBC News
*
Share this with
Email
Facebook
Messenger
Messenger
Twitter
Pinterest
WhatsApp
LinkedIn
Copy this link
These are external links and will open in a new window
Boris Johnson has said now is the time to be "ambitious" about the UK's future, as he set out a post-coronavirus recovery plan.
* Infrastructure projects in England would be "accelerated" and there would be investment in new academy schools, green buses and new broadband, the PM added.
我担心的是我无法使用 BeautifulSoup 请求,因为我收到 HTTP 403 Forbidden 错误,我需要使用 BeautifulSoup 请求。如何通过使用 BeautifulSoup 请求获得与使用 BeautifulSoup 请求时相同的漂亮输出?
解决方案
我建议你坚持BeautifulSoup Request
,但这样做是为了修复 HTTP 403 Forbidden Error:
Request(url, headers={'User-Agent': 'Mozilla/5.0'})
希望这可以帮助!
推荐阅读
- botframework - Microsoft Teams:机器人和任务模块
- css - 为什么这个网格布局中有空白单元格?
- amazon-web-services - 当列仅出现在某些 CSV 中时,爬虫无法正确排序数据
- redux - 错误
- 检查 `Provider` 的渲染方法。反应还原 - c# - DynamoDB:如何通过 id 访问嵌套对象?
- wpf - CefSharp.Wpf.ChromiumWebBrowser SET Window.Name
- javascript - 如何调用验证消息以检查共享点中的字符数
- python - 在树莓派上使用 split_recording 时如何在 filname 中获取时间和日期
- asp.net - 用户控件干扰父页面的按钮
- python - 无法在 Mojave 中酿造升级 Python