首页 > 解决方案 > 使用 BeautifulSoup 从篮球参考中提取表格的问题

问题描述

我想提取具有 id = "all_team-stats-per_game" 的特定表。我正在尝试提取列标题。我能够正确找到具有特定 id 的表,但不确定为什么在搜索标签“tr”时输出为空。代码附在下面。提前致谢。

from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd

# NBA season we will be analyzing
year = 2019

url = "https://www.basketball-reference.com/leagues/NBA_2019.html"

# this is the HTML from the given URL
html = urlopen(url)
soup = BeautifulSoup(html, features="html.parser")

# use findALL() to get the column headers
# soup.findAll('tr', limit=2)

soup = soup.find(id="all_team-stats-per_game")

print(soup.find_all('th'))
#
# headers = [th.getText() for th in soup[0].findAll('th')]
#
# print(headers)

标签: python-3.xmachine-learningbeautifulsouppycharm

解决方案


我试图编辑你的代码。我能够找到所需的 div 标签,但里面的表格作为评论被提及,我也使用检查工具对其进行了验证。所以也许这就是它没有获取表格内容的原因

from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd

# NBA season we will be analyzing
year = 2019

url = "https://www.basketball-reference.com/leagues/NBA_2019.html"

# this is the HTML from the given URL
html = urlopen(url)
soup = BeautifulSoup(html, features="html.parser")

# use findALL() to get the column headers
# soup.findAll('tr', limit=2)

target_div = soup.find("div", {"id": "all_team-stats-per_game"})

print(target_div.prettify())
#
# headers = [th.getText() for th in soup[0].findAll('th')]
#
# print(headers)

推荐阅读