首页 > 解决方案 > BeautifulSoup 请求或请求?

问题描述

当我使用 BeautifulSoup 请求时,我遇到了问题:

 page = urlopen(url).read().decode('utf8')
 soup = BeautifulSoup(page)
 text = ' '.join(map(lambda p: p.text, soup.find_all('p')))
 return soup.title.text, text

我得到了这样一个漂亮的输出:

Coronavirus: Johnson sets out 'ambitious' economic recovery plan - BBC News
*  Share this with Email Facebook Messenger Messenger Twitter Pinterest WhatsApp LinkedIn Copy this link These are external links and will open in a new window Boris Johnson has said now is the time to be "ambitious" about the UK's future, as he set out a post-coronavirus recovery plan.
* Infrastructure projects in England would be "accelerated" and there would be investment in new academy schools, green buses and new broadband, the PM added.

但是当我使用 BeautifulSoup 请求时:

 page = requests.get(url)
 soup = BeautifulSoup(page.content, 'html.parser')
 feed = BeautifulSoup(soup.decode('utf8'))
 text = ' '.join(map(lambda p: p.text, feed.find_all('p')))
 return soup.title.text, text

我得到一个像这样丑陋的输出:

Coronavirus: Johnson sets out 'ambitious' economic recovery plan - BBC News
* 

 
                    Share this with
                    
                       Email
                       
                       Facebook
                       
                       Messenger
                       
                       Messenger
                       
                       Twitter
                       
                       Pinterest
                       
                       WhatsApp
                       
                       LinkedIn
                       
                    Copy this link
                    
                    These are external links and will open in a new window
                    
             Boris Johnson has said now is the time to be "ambitious" about the UK's future, as he set out a post-coronavirus recovery plan.
* Infrastructure projects in England would be "accelerated" and there would be investment in new academy schools, green buses and new broadband, the PM added.

我担心的是我无法使用 BeautifulSoup 请求,因为我收到 HTTP 403 Forbidden 错误,我需要使用 BeautifulSoup 请求。如何通过使用 BeautifulSoup 请求获得与使用 BeautifulSoup 请求时相同的漂亮输出?

标签: python-3.xbeautifulsoup

解决方案


我建议你坚持BeautifulSoup Request,但这样做是为了修复 HTTP 403 Forbidden Error:

Request(url, headers={'User-Agent': 'Mozilla/5.0'})

希望这可以帮助!


推荐阅读