首页 > 解决方案 > 无法提取 HTML 属性

问题描述

我是 python 新手,在抓取一些 HTML 代码时遇到了麻烦。

float-right我正在尝试从下面的 HTML 代码中提取文本“优惠券”、“到期日”和“初始发行价格”以及类中的文本。我只包含了部分 HTML 代码,但我试图从中提取九个不同的部分

<span class="label genericQtipHelp" help="Annual interest rate payable on a security expressed as a percentage of the principal" data-hasqtip="160" aria-describedby="qtip-160">Coupon:</span>
<span class="float-right">3 %</span>
<span class="label genericQtipHelp" help="Date the principal becomes due and payable to bondholders" data-hasqtip="161">Maturity Date:</span>
<span class="float-right">08/12/2021</span>
<span class="label genericQtipHelp" help="Price / Yield at which a new issue of municipal securities is offered to the public" data-hasqtip="163">Initial Offering Price/Yield:</span>
<span class="float-right">3 %</span>

我能够float-right使用以下代码从 HTML 的第二行(类)中提取日期:

输入

elements2 = driver.find_elements_by_class_name("float-right")
for data2 in elements2:
    print(data2.text)

输出

3 %
08/01/2022
08/12/2021
102.829% / 0.08%
$4,525,000
07/30/2021 09:14 AM
07/30/2021 01:30 PM
08/12/2021
-

这将返回存储在float-right类中的所有数据,这正是我所需要的。但是,当我尝试从 HTML 的第一行中提取“到期日期”和其他数据时,我遇到了错误。我相信这是因为我想获得一个属性?

该代码用于尝试拉取到期日期,其他文本如下:

输入 1

elements = driver.find_elements_by_class_name("label genericQtipHelp").__getattribute__('data-hasqtip')
for data in elements:
   print(data.text)

输出 1

 elements = driver.find_elements_by_class_name("label genericQtipHelp").__getattribute__('data-hasqtip')
AttributeError: 'list' object has no attribute 'data-hasqtip'

输入 2

elements = soup.find('class', attrs={"label genericQtipHelp":'data-hasqtip'})
print(elements.text)

输出 2

AttributeError: 'NoneType' object has no attribute 'text'

我已经尝试了其他一些事情,但我最终遇到了类似的错误。我如何提取这些数据,有没有更简单的方法来提取所有九个到期日3% ?

谢谢!

标签: pythonhtmlseleniumweb-scrapingbeautifulsoup

解决方案


您可以使用beautifulsoup.

  • 查找所有<span>带有类名的label genericQtipHelpusing.find_all()方法。这将为您提供所有<span>具有给定类名的列表。它的文本是名称 [Maturity, Coupon , Yield etc.,]
  • 遍历上面的列表,对于每个跨度,找到下一个跨度(值存在于此跨度内。)使用findNext('span')并获取它的值。
from bs4 import BeautifulSoup

s = '''
<span class="label genericQtipHelp" help="Annual interest rate payable on a security expressed as a percentage of the principal" data-hasqtip="160" aria-describedby="qtip-160">Coupon:</span>
<span class="float-right">3 %</span>
<span class="label genericQtipHelp" help="Date the principal becomes due and payable to bondholders" data-hasqtip="161">Maturity Date:</span>
<span class="float-right">08/12/2021</span>
<span class="label genericQtipHelp" help="Price / Yield at which a new issue of municipal securities is offered to the public" data-hasqtip="163">Initial Offering Price/Yield:</span>
<span class="float-right">3 %</span>'''

soup = BeautifulSoup(s, 'lxml')
spans = soup.find_all('span', class_='label genericQtipHelp')


for span in spans:
    name = span.text.strip()
    val = span.findNext('span').text.strip()
    print(f"{name:35} {val}")
Coupon:                             3 %
Maturity Date:                      08/12/2021
Initial Offering Price/Yield:       3 %

推荐阅读