首页 > 解决方案 > Looping through html with beautiful soup in Python

问题描述

I'm trying to loop through a html table.

On the page I'm looking through there is only one table. So that's easy to locate. Under that there are several <tr>s, and I want to look through these apart from some headers defined by <th> instead of <td>s. Each <tr> consists of several different classed in the <td>s. I'm only looking to collect the two <td>'s with class="table-name" and the <td> with the class="table-score".

I have tried to work with:

rows = html.find("table", class_="table").find_all("tr")

for row in rows:
    if row.find("th") is None:
        td_names = row.findall("td")

for td_name in td_names:
    print(td_name)

But I'm really having any success with that.

So basically the html looks something like this:

<table>
  <tr>
    <th>Header</th>
  </tr>
  <tr>
    <td class="table-rank">1</td>
    <td class="table-name">John</td>
    <td class="table-name">Jim</td>
    <td class="table-place">Russia</td>
    <td class="table-score">2-1</td>
  </tr>
</table>

I'm only looking for "John", "Jim", "2-1".

Thanks in advance.

标签: pythonbeautifulsoup

解决方案


find_all() will return a list of all elements matching the filter. You can use index of the list to choose the element you need. 0 for first, 1 for second etc.

from bs4 import BeautifulSoup
html="""
<table>
<tr>
<th>Header</th>
</tr>
<tr>
<td class="table-rank">1</td>
<td class="table-name">John</td>
<td class="table-name">Jim</td>
<td class="table-place">Russia</td>
<td class="table-score">2-1</td>
</tr>
</table>
"""
soup=BeautifulSoup(html,'html.parser')
our_tr=soup.find('table').find_all('tr')[1] #the second tr in the table - index starts at 0
#print all td's of seconf tr
our_tds=our_tr.find_all('td')
print(our_tds[1].text)
print(our_tds[2].text)
print(our_tds[4].text)

Output

John
Jim
2-1

推荐阅读