首页 > 解决方案 > Web Scraping Fbref 表

问题描述

到目前为止,我的代码适用于 FBref 网站上的不同表格,但难以获取玩家详细信息。下面的代码:

import requests
from bs4 import BeautifulSoup, Comment


url = 'https://fbref.com/en/squads/18bb7c10/Arsenal-Stats'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

table = BeautifulSoup(soup.select_one('#stats_standard').find_next(text=lambda x: isinstance(x, Comment)), 'html.parser')

#print some information from the table to screen:
for tr in table.select('tr:has(td)'):
tds = [td.get_text(strip=True) for td in tr.select('td')]
print('{:<30}{:<20}{:<10}'.format(tds[0], tds[3], tds[5]))

给我错误

AttributeError: 'NoneType' object has no attribute 'find_next'

标签: pythonbeautifulsoupdomain-data-modelling

解决方案


发生什么了?

如前所述,没有stats_standardid应该是id的表stats_standard_10728

如何修复并变得有点通用

将您的表格选择器更改为:

table = soup.select_one('table[id^="stats_standard"]')

例子

import requests
from bs4 import BeautifulSoup, Comment


url = 'https://fbref.com/en/squads/18bb7c10/Arsenal-Stats'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

table = soup.select_one('table[id^="stats_standard"]')

#print some information from the table to screen:
for tr in table.select('tr:has(td)'):
    tds = [td.get_text(strip=True) for td in tr.select('td')]
    print('{:<30}{:<20}{:<10}'.format(tds[0], tds[3], tds[5]))

以防万一

您可以使用 pandas read_html()来抓取、显示和修改表格数据,让您的生活更轻松。

例子

import pandas as pd
pd.read_html('https://fbref.com/en/squads/18bb7c10/Arsenal-Stats')[0]

推荐阅读