首页 > 解决方案 > Python BeautifulSoup | HTTP 错误:禁止

问题描述

我被困在这里,它给了我 httperror: 第 4 行禁止。当我尝试使用其他网站时,它可以工作,但是在这个网站上它不会工作,为什么?

from bs4 import BeautifulSoup as bs
from urllib.request import urlopen
import urllib.request

sauce=urllib.request.urlopen("https://socialblade.com/youtube/top/50").read()
soup=urlopen(sauce,'lxml')
print(soup)

标签: pythonweb-scrapingbeautifulsoup

解决方案


指定User-AgentHTTP 标头以从服务器获得正确的响应。例如:

import urllib.request
from urllib.request import urlopen
from bs4 import BeautifulSoup as bs

url = "https://socialblade.com/youtube/top/50"
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0'}

req = urllib.request.Request(url, headers=headers)
response = urllib.request.urlopen(req)
soup = bs(response.read(), 'html.parser')
print(soup.prettify())

印刷:

<!DOCTYPE html>
<head>
 <title>
  Top 50 YouTubers sorted by SB Score - Socialblade YouTube Stats | YouTube Statistics
 </title>

...

推荐阅读