首页 > 解决方案 > BeautifulSoup 错误:TypeError:“NoneType”类型的对象没有 len()

问题描述

在使用 BeautifulSoup 并解析 url 时,我遇到了这个错误:

Traceback (most recent call last):
  File "/Users/justinhudacsko/PycharmProjects/SportsBot/scrape.py", line 8, in <module>
    stats_page = BeautifulSoup(comment, "lxml")
  File "/usr/local/lib/python3.9/site-packages/bs4/__init__.py", line 310, in __init__
    elif len(markup) <= 256 and (
TypeError: object of type 'NoneType' has no len()

我的代码是:

from urllib.request import urlopen
from bs4 import BeautifulSoup, Comment

url = 'https://www.pro-football-reference.com/years/2020/draft.htm'
html = urlopen(url)
soup = BeautifulSoup(html, "lxml")
comment = soup.find(text=lambda text: isinstance(text, Comment) and 'class="table_outer_container"' in text) #THIS RETURNS NONE
stats_page = BeautifulSoup(comment, "lxml")

为什么变量commentNone它的值,即使class="table_outer_container"这个 url 中有实例?

标签: pythonweb-scrapingbeautifulsoup

解决方案


find您使用的方法将仅返回包含的 HTML注释'class="table_outer_container"',而我假设您想要获取其类为的元素的内容table_outer_container

您可以按以下方式执行此操作:

from urllib.request import urlopen
from bs4 import BeautifulSoup

url = 'https://www.pro-football-reference.com/years/2020/draft.htm'
html = urlopen(url)
soup = BeautifulSoup(html, "lxml")
table = soup.find('div', class_='table_outer_container')

推荐阅读