首页 > 解决方案 > 从同一类标签中抓取文本

问题描述

这是我尝试从中获取文本的 HTML

<div class="scoreboardColumn-2OtpR compactHeader-1b8nN"><div>0</div><div>21</div></div><div 
class="scoreboardColumn-2OtpR compactHeader-1b8nN"><div>0</div><div>17</div></div><div 
class="scoreboardColumn-2OtpR compactHeader-1b8nN"><div>0</div><div>14</div></div><div 
class="scoreboardColumn-2OtpR compactHeader-1b8nN"><div>0</div><div>7</div></div><div 
</div></div></div>

我想分别在每个类中的每个第二个 div 中获取每个第一个 div,例如第一个 div:

0
0
0
0

第二个div:

21
17
14
7

标签: python-3.xbeautifulsoup

解决方案


您可以使用 CSS 选择器:nth-of-type

from bs4 import BeautifulSoup

html_doc = """<div class="scoreboardColumn-2OtpR compactHeader-1b8nN"><div>0</div><div>21</div></div><div 
class="scoreboardColumn-2OtpR compactHeader-1b8nN"><div>0</div><div>17</div></div><div 
class="scoreboardColumn-2OtpR compactHeader-1b8nN"><div>0</div><div>14</div></div><div 
class="scoreboardColumn-2OtpR compactHeader-1b8nN"><div>0</div><div>7</div></div><div 
</div></div></div>"""

soup = BeautifulSoup(html_doc, "html.parser")

for first_div in soup.select(".scoreboardColumn-2OtpR > div:nth-of-type(1)"):
    print(first_div.text)

print()

for second_div in soup.select(".scoreboardColumn-2OtpR > div:nth-of-type(2)"):
    print(second_div.text)

印刷:

0
0
0
0

21
17
14
7

或者没有 CSS 选择器:

for s in soup.find_all(class_="scoreboardColumn-2OtpR"):
    divs = s.find_all("div")
    print("First: {} Second: {}".format(divs[0].text, divs[1].text))

印刷:

First: 0 Second: 21
First: 0 Second: 17
First: 0 Second: 14
First: 0 Second: 7

推荐阅读