html - 当我使用 findparent 和 findAll 抓取表格时的空列表
问题描述
我正在开展一个项目,以收集游戏中某些服务器的每日数量,以了解它们是如何演变的。这是一个表,其中每个服务器都是一个“tr”,其中包含几个“td”,其中包含诸如玩家数量和无用信息之类的信息。问题是我设法挑选了所有我感兴趣的“tr”,丢弃了我不想要的,但现在我被困在试图只选择每个“tr”中的“td”玩家的数量,但我不能。
这是我到目前为止写的代码:
import requests
import pandas as pd
from bs4 import BeautifulSoup
import csv
from pprint import pprint
from datetime import date
url = ('https://www.tibia.com/community/?subtopic=worlds')
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
file = open('players_online', 'a')
writer = csv.writer(file)
list_of_players = list()
finding_td = soup.find_all('a', string=worlds)
for looking_for_players in finding_td:
parent_tr = looking_for_players.find_parent('tr')
names1 = [clean_data.findAll('td') for clean_data in parent_tr]
list_of_players.append(parent_tr)
如果我打印 'print(finding_td) 我会得到以下信息:
<a href="https://www.tibia.com/community/?subtopic=worlds&world=Astera">Astera</a>,
<a href="https://www.tibia.com/community/?subtopic=worlds&world=Belobra">Belobra</a>,
<a href="https://www.tibia.com/community/?subtopic=worlds&world=Calmera">Calmera</a>,
<a href="https://www.tibia.com/community/?subtopic=worlds&world=Celebra">Celebra</a>,
<a href="https://www.tibia.com/community/?subtopic=worlds&world=Gentebra">Gentebra</a>,
<a href="https://www.tibia.com/community/?subtopic=worlds&world=Kalibra">Kalibra</a>,
<a href="https://www.tibia.com/community/?subtopic=worlds&world=Luminera">Luminera</a>,
<a href="https://www.tibia.com/community/?subtopic=worlds&world=Menera">Menera</a>,
<a href="https://www.tibia.com/community/?subtopic=worlds&world=Nefera">Nefera</a>,
<a href="https://www.tibia.com/community/?subtopic=worlds&world=Pacera">Pacera</a>,
<a href="https://www.tibia.com/community/?subtopic=worlds&world=Yonabra">Yonabra</a>]
这就是我想要的,现在我使用 findparent,当我'print(finding_tr)我得到:
<tr class="Even"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&world=Belobra">Belobra</a></td><td style="text-align: right;">731</td><td>South America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '<p>On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since June 22, 2017.</p>', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Even"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&world=Calmera">Calmera</a></td><td style="text-align: right;">318</td><td>North America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '<p>On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since September 12, 2017.</p>', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Even"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&world=Celebra">Celebra</a></td><td style="text-align: right;">559</td><td>South America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '<p>On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since October 29, 2018.</p>', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Odd"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&world=Gentebra">Gentebra</a></td><td style="text-align: right;">757</td><td>South America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '<p>On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since December 12, 2017.</p>', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Even"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&world=Kalibra">Kalibra</a></td><td style="text-align: right;">716</td><td>South America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '<p>On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since December 12, 2017.</p>', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Odd"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&world=Luminera">Luminera</a></td><td style="text-align: right;">295</td><td>North America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '<p>On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since September 5, 2017.</p>', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Even"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&world=Menera">Menera</a></td><td style="text-align: right;">364</td><td>North America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '<p>On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since September 5, 2017.</p>', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Even"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&world=Nefera">Nefera</a></td><td style="text-align: right;">465</td><td>North America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '<p>On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since April 19, 2018.</p>', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Odd"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&world=Pacera">Pacera</a></td><td style="text-align: right;">336</td><td>North America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '<p>On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since September 12, 2017.</p>', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Even"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&world=Yonabra">Yonabra</a></td><td style="text-align: right;">446</td><td>South America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '<p>On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since May 27, 2020.</p>', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
到目前为止一切顺利,既然我拥有了所有的 td,我想制作一行来仅选择包含玩家数量的 td,我这样做如下:
names1 = [clean_data.findAll('td') for clean_data in parent_tr]
但是当我附加或打印它时,它会给出:
[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]
如果我使用names1 = [clean_data.find('td')[3] for clean_data in parent_tr]来查找包含我想要的数据的特定 'td',控制台会说:
“IndexError:列表索引超出范围”。
这是有道理的,因为它毕竟是一个空列表。知道出了什么问题吗?
解决方案
要获取每个常规世界的名称和人口,您可以尝试:
import requests
from bs4 import BeautifulSoup
url = "https://www.tibia.com/community/?subtopic=worlds"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
for a in soup.select(".TableContent")[2].select("td > a"):
name = a.get_text(strip=True)
pop = a.find_next("td").get_text(strip=True)
print("{:<30} {}".format(name, pop))
印刷:
Adra 54
Antica 271
Assombra 166
Astera 419
Belluma 18
Belobra 743
Bona 150
Calmera 326
Carnera 73
Celebra 560
Celesta 82
Concorda 51
Cosera 103
Damora 89
Descubra 524
Dibra 379
Duna 6
Emera 70
Epoca 23
Estela 129
Faluna 24
Ferobra 599
Firmera 180
Funera 83
Furia 14
Garnera 299
Gentebra 769
Gladera 464
Harmonia 112
Helera 55
Honbra 629
Impera 340
Inabra 636
Javibra 229
Jonera 131
Kalibra 700
Karna 175
Kenora 90
Libertabra 364
Lobera 473
Luminera 293
Lutabra 469
Macabra 277
Menera 365
Mitigera 100
Monza 112
Mudabra 427
Nefera 475
Noctera 195
Nossobra 252
Olera 87
Ombra 601
Optera 186
Pacembra 402
Pacera 352
Peloria 226
Premia 74
Pyra 4
Quelibra 578
Quintera 280
Ragna 15
Refugia 103
Reinobra 555
Relania 54
Relembra 330
Secura 175
Serdebra 606
Serenebra 394
Solidera 395
Talera 538
Torpera 102
Tortura 25
Unica 13
Utobra 355
Venebra 485
Vita 31
Vunira 154
Wintera 415
Wizera 209
Xandebra 528
Xylona 16
Yonabra 419
Ysolera 87
Zenobra 281
Zuna 3
Zunera 39
或者:选择所有链接并检查上一个标题的文本是否为“常规世界”:
import requests
from bs4 import BeautifulSoup
url = "https://www.tibia.com/community/?subtopic=worlds"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
for a in soup.select(".TableContent td > a"):
# check if we are in "Regular Worlds" table:
header = a.find_previous("td", {"style": "text-align: center;"})
if header.get_text(strip=True) != "Regular Worlds":
continue
name = a.get_text(strip=True)
pop = a.find_next("td").get_text(strip=True)
print("{:<30} {}".format(name, pop))
推荐阅读
- c++ - C++ 添加数组
- amazon-web-services - AWS CodeDeploy 未在 Lightsail 基本设置 SSL 上部署
- c# - 停止等待更改线程 C# 的请求
- sql - 按定义的条件从表中选择列的次数
- laravel - 将多张图片保存到 Laravel Livewire
- javascript - 在键值对匹配的嵌套对象中求和值
- haskell - Haskell - 将整数列表映射到它们的平方根
- javascript - 从word文档粘贴时,输入字段单引号变成正方形
- excel - VBA:如何检查特定单元格是否包含图像,然后删除图像(如果存在)
- python - 如何在时间序列图的 xticks 上显示自定义间隔?