首页 > 解决方案 > Python:尝试抓取 youtube 时出错

问题描述

在尝试从 youtube 的主页上抓取每个运行此代码的视频的标题时

import request
from bs4 import BeautifulSoup

url = 'https://www.youtube.com'
html = requests.get(url)
soup = BeautifulSoup(html.content, "html.parser")
print(soup('a'))

并返回此错误

Traceback (most recent call last):
File "C:\Users\kenda\OneDrive\Desktop\Projects\youtube.py", line 7, in < 
<module>
print(soup('a'))
File "C:\Users\kenda\AppData\Local\Programs\Python\Python36- 
32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f384' in 
position 45442: character maps to <undefined>
[Finished in 4.83s]

我该如何解决?以及为什么在我抓取 youtube 时专门这样做

标签: pythonweb-scrapingpython-requests

解决方案


Urllib 要好得多,使用起来很舒服。

from urllib.request import urlopen

from bs4 import BeautifulSoup

urlopen 函数将 url 转换为 html

url = 'https://www.youtube.com'
html = urlopen(url)

beautifulsoup 将保留 html

soup = BeautifulSoup(html, 'html.parser')
print(soup.find_all('a'))

如果你绝对想用请求来做,解决方案是:

import requests
from bs4 import BeautifulSoup
url = 'https://www.youtube.com'
resp = requests.get(url)
html = resp.text
soup = BeautifulSoup(html, 'html.parser')
print(soup.find_all('a'))

推荐阅读