首页 > 解决方案 > 从 div 类中选择第一个 span 标签

问题描述

  <div class="ticket_last_24 report_table_right">
                            <span>13,978</span>
                            <span>(</span><span class="change_increase">+2.3% 
                            </span><span>)</span>                       
</div>

                        <div class="ticket_last_week report_table_right">
                            <span>99,585</span>
                            <span>(</span><span class="change_increase">+0.6% 
                        </span><span>)</span>                       
</div>
  <div class="ticket_last_24 report_table_right">
                            <span>12121</span>
                            <span>(</span><span class="change_increase">+2.3% 
                            </span><span>)</span>                       
</div>

                        <div class="ticket_last_week report_table_right">
                            <span>99,222</span>
                            <span>(</span><span class="change_increase">+0.6% 
                        </span><span>)</span>                       

</div>

我尝试了下面的代码:

text=[]
from bs4 import BeautifulSoup
TicketNuber=soup.find_all("div")
for div in TicketNuber:
        text.append(div.find("span"))
it prints out:[
 '13,978',
 '13,978',
 '99,585',
 '12,121'
 '12,121'
 '99,222'
 ]

不知道为什么第一个数字会打印两次。我只想要数字 ['13,978','99492','12,121','99,222']。同一个标签中没有重复的数字

标签: htmlpython-3.xbeautifulsoup

解决方案


这可能会完成这项工作:

from bs4 import BeautifulSoup

document = '''
<div class="ticket_last_24 report_table_right">
  <span>13,978</span>
  <span>(</span><span class="change_increase">+2.3% 
  </span><span>)</span>                       
</div>

<div class="ticket_last_week report_table_right">
  <span>99,585</span>
  <span>(</span><span class="change_increase">+0.6% 
  </span><span>)</span>                       
</div>

<div class="ticket_last_24 report_table_right">
  <span>12121</span>
  <span>(</span><span class="change_increase">+2.3% 
  </span><span>)</span>                       
</div>

<div class="ticket_last_week report_table_right">
  <span>99,222</span>
  <span>(</span><span class="change_increase">+0.6% 
  </span><span>)</span>
</div>
'''

soup = BeautifulSoup(document, "lxml")

for div in soup.find_all("div"):
    print(div.find("span").text)

输出:

13,978
99,585
12121
99,222

显然,您的 HTML 文档和我的 HTML 文档存在一些差异,这必须归结为您获取的片段与实际文档不匹配,您可以使用print(soup). 您还只发布了部分代码(不是mcve,因此我需要查看整个故事以进一步提供帮助。


推荐阅读