python - 使用 Beautiful Soup AttributeError: 'NoneType' 在 Python 中抓取表格
问题描述
我正在尝试获取许多国家和多年的数据,并设置了包括国家 URL 的列表。
这是我的代码:
for l in range(0, len(league_urls)):
time.sleep(0.5)
#The second loop is for each year we want to scrape
for n in range(2007,2020):
time.sleep(0.5)
df_soccer1 = None
url = league_urls[l] + str(n) + str('&altersklasse=alle')
headers = {"User-Agent":"Mozilla/5.0"}
response = requests.get(url, headers=headers, verify=False)
time.sleep(0.5)
soup = BeautifulSoup(response.text, 'html.parser')
#Table 1 with information about the value
table = soup.find("table", {"class" : "items"})
team = []
players_used = []
minutes_nonforeign = []
minutes_foreign = []
for row in table.find_all('tr')[1:]:
try:
col = row.find_all('td')
team_ = col[1].text
players_used_ = col[2].text
minutes_nonforeign_ = col[3].text
minutes_foreign_ = col[4].text
team.append(team_)
players_used.append(players_used_)
minutes_nonforeign.append(minutes_nonforeign_)
minutes_foreign.append(minutes_foreign_)
except:
team.append('')
players_used.append('')
minutes_nonforeign.append('')
minutes_foreign.append('')
team = [elem.replace('\n','').replace('\xa0','').strip() for elem in team]
#Table 2 with information about placement, goals and points
df_soccer2 = None
table2 = soup.find("div", {"class" : "box tab-print"})
team2 = []
place = []
matches = []
difference = []
pts = []
for row in table2.find_all('tr'):
try:
col = row.findAll('td')
team2_ = col[2].text
place_ = col[0].text
matches_ = col[3].text
difference_ = col[4].text
pts_ = col[5].text
team2.append(team2_)
place.append(place_)
matches.append(matches_)
difference.append(difference_)
pts.append(pts_)
except:
team2.append('')
place.append('')
matches.append('')
difference.append('')
pts.append('')
team2 = [elem.replace('\n','').replace('\xa0','').strip() for elem in team2]
df_soccer1 = pd.DataFrame({'Team': team[1:], 'Season': [n]*(len(team)-1), 'Players used': players_used[1:],
'Minutes nonforeign': minutes_nonforeign[1:], 'Minutes foreign': minutes_foreign[1:]})
df_soccer2 = pd.DataFrame({'Team': team2, 'Place': place, 'Matches': matches, 'Difference': difference,
'Points': pts})
刮第一张桌子时我遇到了这个问题:
AttributeError Traceback (most recent call last)
<ipython-input-46-b4cd681f68e8> in <module>
21 minutes_foreign = []
22
---> 23 for row in table.find_all("tr")[1:]:
24 try:
25 col = row.find_all('td')
AttributeError: 'NoneType' object has no attribute 'find_all'
需要注意的是,league_urls 是一个长长的 URL 列表。
我在网站的另一部分使用了类似的代码,效果很好。我似乎无法弄清楚为什么它不适用于这个。
此外,当我只使用一个 URL 运行代码时,它的效果很好。是否有可能存在一些问题,因为我循环了 12 年以获取 55 个不同的 URL?
解决方案
测试表是否为无,例如
import requests
from bs4 import BeautifulSoup
url = 'https://www.transfermarkt.com/remier-liga/legionaereeinsaetze/wettbewerb/RU1/plus/?option=spiele&saison_id=2011&altersklasse=alle'
headers = {"User-Agent":"Mozilla/5.0"}
response = requests.get(url, headers=headers, verify=False)
#time.sleep(0.5)
soup = BeautifulSoup(response.text, 'html.parser')
#Table 1 with information about the value
table = soup.find("table", {"class" : "items"})
team = []
players_used = []
minutes_nonforeign = []
minutes_foreign = []
if not table is None:
for row in table.find_all('tr')[1:]:
try:
col = row.find_all('td')
team_ = col[1].text
players_used_ = col[2].text
minutes_nonforeign_ = col[3].text
minutes_foreign_ = col[4].text
team.append(team_)
players_used.append(players_used_)
minutes_nonforeign.append(minutes_nonforeign_)
minutes_foreign.append(minutes_foreign_)
except:
team.append('')
players_used.append('')
minutes_nonforeign.append('')
minutes_foreign.append('')
else:
team.append('')
players_used.append('')
minutes_nonforeign.append('')
minutes_foreign.append('')
推荐阅读
- javascript - 未捕获的 ReferenceError:未定义打字机
- python - 如何批量更新 Google Cloud Functions 中的 requirements.txt 文件
- swift - 在隐式展开可选值时意外发现 nil:无法解决
- postgresql - .sql 文件中的 \i '/docker-entrypoint-initdb.d/tables/users.sql' 显示错误。不知道为什么
- c++ - 初始化列表、参数包扩展、折叠表达式和求值顺序
- azure - Azure Function AppSettings 使用 Terraform 和多个地图源
- java - 为什么继承在这里不能正常工作?扩展 Java 类中的方法没有给我想要得到的结果
- javascript - 在 React 中更改背景图片的 URL
- javascript - 使用 for 循环时,如何在开始另一次提取之前等待一次提取完成?
- c++ - 我想在激活缓冲区后使用 cin 或 cout 。我应该怎么办