首页 > 解决方案 > 当我使用 findparent 和 findAll 抓取表格时的空列表

问题描述

我正在开展一个项目,以收集游戏中某些服务器的每日数量,以了解它们是如何演变的。这是一个表,其中每个服务器都是一个“tr”,其中包含几个“td”,其中包含诸如玩家数量和无用信息之类的信息。问题是我设法挑选了所有我感兴趣的“tr”,丢弃了我不想要的,但现在我被困在试图只选择每个“tr”中的“td”玩家的数量,但我不能。

这是表格:
这是桌子

这是该表的 html:
这是该表的 html

这是我到目前为止写的代码:

import requests
import pandas as pd
from bs4 import BeautifulSoup
import csv
from pprint import pprint
from datetime import date

url = ('https://www.tibia.com/community/?subtopic=worlds')
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
file = open('players_online', 'a')
writer = csv.writer(file)

list_of_players = list()
finding_td = soup.find_all('a', string=worlds)
for looking_for_players in finding_td:
    parent_tr = looking_for_players.find_parent('tr')
    names1 = [clean_data.findAll('td') for clean_data in parent_tr]
    list_of_players.append(parent_tr)

如果我打印 'print(finding_td) 我会得到以下信息:

<a href="https://www.tibia.com/community/?subtopic=worlds&amp;world=Astera">Astera</a>, 
<a href="https://www.tibia.com/community/?subtopic=worlds&amp;world=Belobra">Belobra</a>, 
<a href="https://www.tibia.com/community/?subtopic=worlds&amp;world=Calmera">Calmera</a>, 
<a href="https://www.tibia.com/community/?subtopic=worlds&amp;world=Celebra">Celebra</a>, 
<a href="https://www.tibia.com/community/?subtopic=worlds&amp;world=Gentebra">Gentebra</a>, 
<a href="https://www.tibia.com/community/?subtopic=worlds&amp;world=Kalibra">Kalibra</a>, 
<a href="https://www.tibia.com/community/?subtopic=worlds&amp;world=Luminera">Luminera</a>, 
<a href="https://www.tibia.com/community/?subtopic=worlds&amp;world=Menera">Menera</a>, 
<a href="https://www.tibia.com/community/?subtopic=worlds&amp;world=Nefera">Nefera</a>, 
<a href="https://www.tibia.com/community/?subtopic=worlds&amp;world=Pacera">Pacera</a>, 
<a href="https://www.tibia.com/community/?subtopic=worlds&amp;world=Yonabra">Yonabra</a>]

这就是我想要的,现在我使用 findparent,当我'print(finding_tr)我得到:

<tr class="Even"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&amp;world=Belobra">Belobra</a></td><td style="text-align: right;">731</td><td>South America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '&lt;p&gt;On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since June 22, 2017.&lt;/p&gt;', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Even"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&amp;world=Calmera">Calmera</a></td><td style="text-align: right;">318</td><td>North America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '&lt;p&gt;On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since September 12, 2017.&lt;/p&gt;', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Even"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&amp;world=Celebra">Celebra</a></td><td style="text-align: right;">559</td><td>South America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '&lt;p&gt;On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since October 29, 2018.&lt;/p&gt;', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Odd"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&amp;world=Gentebra">Gentebra</a></td><td style="text-align: right;">757</td><td>South America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '&lt;p&gt;On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since December 12, 2017.&lt;/p&gt;', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Even"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&amp;world=Kalibra">Kalibra</a></td><td style="text-align: right;">716</td><td>South America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '&lt;p&gt;On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since December 12, 2017.&lt;/p&gt;', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Odd"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&amp;world=Luminera">Luminera</a></td><td style="text-align: right;">295</td><td>North America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '&lt;p&gt;On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since September 5, 2017.&lt;/p&gt;', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Even"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&amp;world=Menera">Menera</a></td><td style="text-align: right;">364</td><td>North America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '&lt;p&gt;On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since September 5, 2017.&lt;/p&gt;', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Even"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&amp;world=Nefera">Nefera</a></td><td style="text-align: right;">465</td><td>North America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '&lt;p&gt;On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since April 19, 2018.&lt;/p&gt;', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Odd"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&amp;world=Pacera">Pacera</a></td><td style="text-align: right;">336</td><td>North America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '&lt;p&gt;On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since September 12, 2017.&lt;/p&gt;', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>
<tr class="Even"><td style="width: 150px;"><a href="https://www.tibia.com/community/?subtopic=worlds&amp;world=Yonabra">Yonabra</a></td><td style="text-align: right;">446</td><td>South America</td><td>Optional PvP</td><td align="center" valign="middle"> <span style="width: 18px; height: 18px;"><a href="../common/help.php?subtopic=battleye" target="_blank"><span class="HelperDivIndicator" onmouseout="$('#HelperDivContainer').hide();" onmouseover="ActivateHelperDiv($(this), 'BattlEye Protected Game World', '&lt;p&gt;On this game world, BattlEye blocks cheats from the game. The game world has been protected by BattlEye since May 27, 2020.&lt;/p&gt;', '');"><img src="https://static.tibia.com/images/global/content/icon_battleye.gif" style="border: 0px;"/></span></a></span></td><td></td></tr>

到目前为止一切顺利,既然我拥有了所有的 td,我想制作一行来仅选择包含玩家数量的 td,我这样做如下:

names1 = [clean_data.findAll('td') for clean_data in parent_tr]

但是当我附加或打印它时,它会给出:

[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]
[[], [], [], [], [], []]

如果我使用names1 = [clean_data.find('td')[3] for clean_data in parent_tr]来查找包含我想要的数据的特定 'td',控制台会说:

“IndexError:列表索引超出范围”

这是有道理的,因为它毕竟是一个空列表。知道出了什么问题吗?

标签: htmlweb-scrapingbeautifulsoup

解决方案


要获取每个常规世界的名称和人口,您可以尝试:

import requests
from bs4 import BeautifulSoup


url = "https://www.tibia.com/community/?subtopic=worlds"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

for a in soup.select(".TableContent")[2].select("td > a"):
    name = a.get_text(strip=True)
    pop = a.find_next("td").get_text(strip=True)

    print("{:<30} {}".format(name, pop))

印刷:

Adra                           54
Antica                         271
Assombra                       166
Astera                         419
Belluma                        18
Belobra                        743
Bona                           150
Calmera                        326
Carnera                        73
Celebra                        560
Celesta                        82
Concorda                       51
Cosera                         103
Damora                         89
Descubra                       524
Dibra                          379
Duna                           6
Emera                          70
Epoca                          23
Estela                         129
Faluna                         24
Ferobra                        599
Firmera                        180
Funera                         83
Furia                          14
Garnera                        299
Gentebra                       769
Gladera                        464
Harmonia                       112
Helera                         55
Honbra                         629
Impera                         340
Inabra                         636
Javibra                        229
Jonera                         131
Kalibra                        700
Karna                          175
Kenora                         90
Libertabra                     364
Lobera                         473
Luminera                       293
Lutabra                        469
Macabra                        277
Menera                         365
Mitigera                       100
Monza                          112
Mudabra                        427
Nefera                         475
Noctera                        195
Nossobra                       252
Olera                          87
Ombra                          601
Optera                         186
Pacembra                       402
Pacera                         352
Peloria                        226
Premia                         74
Pyra                           4
Quelibra                       578
Quintera                       280
Ragna                          15
Refugia                        103
Reinobra                       555
Relania                        54
Relembra                       330
Secura                         175
Serdebra                       606
Serenebra                      394
Solidera                       395
Talera                         538
Torpera                        102
Tortura                        25
Unica                          13
Utobra                         355
Venebra                        485
Vita                           31
Vunira                         154
Wintera                        415
Wizera                         209
Xandebra                       528
Xylona                         16
Yonabra                        419
Ysolera                        87
Zenobra                        281
Zuna                           3
Zunera                         39

或者:选择所有链接并检查上一个标题的文本是否为“常规世界”:

import requests
from bs4 import BeautifulSoup


url = "https://www.tibia.com/community/?subtopic=worlds"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

for a in soup.select(".TableContent td > a"):
    # check if we are in "Regular Worlds" table:
    header = a.find_previous("td", {"style": "text-align: center;"})
    if header.get_text(strip=True) != "Regular Worlds":
        continue

    name = a.get_text(strip=True)
    pop = a.find_next("td").get_text(strip=True)

    print("{:<30} {}".format(name, pop))

推荐阅读