python - 从美丽的汤中获取标签'a'
问题描述
我有一个 html 页面作为汤“a”。在那个页面上,我有兴趣在包含文本“AFT”(不区分大小写)的标签下找到 hreff。在这样做时:
>>> rows = a.findAll('span', attrs={'class': 'views-field views-field-title'})
输出是:
[<span class="views-field views-field-title"><span class="field-content">
<a href="/index.php/en/publications/communiques-presse/20201030-next-issuance-btfs" hreflang="en">30 October 2020: AFT’s next issuance of BTFs: Monday 02 November 2020 </a>
</span></span>, <span class="views-field views-field-title"><span class="field-content">
<a href="/index.php/en/publications/communiques-presse/20201030-next-issuance-oats" hreflang="en">30 October 2020: BFT’s next issuance of long-term OATs: Thursday 05 November 2020</a>
</span></span>, <span class="views-field views-field-title"><span class="field-content">
<a href="/index.php/en/publications/communiques-presse/20201026-issuance-btfs" hreflang="en">26 October 2020: AFT's issuance: 5.289 billion euros of BTFs</a>
</span></span>, <span class="views-field views-field-title"><span class="field-content">
<a href="/index.php/en/publications/communiques-presse/20201023-next-issuance-btfs" hreflang="en">23 October 2020: AFT’s next issuance of BTFs: Monday 26 October 2020 </a>
</span></span>, <span class="views-field views-field-title"><span class="field-content">
<a href="/index.php/en/publications/communiques-presse/20201019-issuance-btfs" hreflang="en">19 October 2020: AFT's issuance: 5.489 billion euros of BTFs</a>
</span></span>, <span class="views-field views-field-title"><span class="field-content">
<a href="/index.php/en/publications/communiques-presse/20201016-next-issuance-btfs" hreflang="en">16 October 2020: AFT’s next issuance of BTFs: Monday 19 October 2020 </a>
</span></span>, <span class="views-field views-field-title"><span class="field-content">
<a href="/index.php/en/publications/communiques-presse/20201015-next-issuance-inflation-indexed-oats" hreflang="en">15 October 2020: AFT’s issuance: 1.000 billion euros of inflation-indexed OATs</a>
</span></span>, <span class="views-field views-field-title"><span class="field-content">
<a href="/index.php/en/publications/communiques-presse/20201015-issuance-oats" hreflang="en">15 October 2020: AFT’s issuance: 7.240 billion euros of medium-term OATs</a>
</span></span>, <span class="views-field views-field-title"><span class="field-content">
<a href="/index.php/en/publications/communiques-presse/20201012-issuance-btfs" hreflang="en">12 October 2020: AFT's issuance: 5.288 billion euros of BTFs</a>
</span></span>, <span class="views-field views-field-title"><span class="field-content">
<a href="/index.php/en/publications/communiques-presse/20201009-next-issuance-indexed-oats" hreflang="en">09 October 2020: AFT’s next issuance of inflation-indexed OATs: Thursday 15 October 2020</a>
</span></span>, <span class="views-field views-field-title"><span class="field-content">
<a href="/index.php/en/publications/communiques-presse/20201009-next-issuance-btfs" hreflang="en">09 October 2020: AFT’s next issuance of BTFs: Monday 12 October 2020 </a>
</span></span>, <span class="views-field views-field-title"><span class="field-content">
<a href="/index.php/en/publications/communiques-presse/20201009-next-issuance-oats" hreflang="en">09 October 2020: AFT’s next issuance of medium-term OATs: Thursday 15 October 2020</a>
</span></span>]
所以从上面我想要除了this(列表的第二个元素)内的所有hreff,因为它不包含'AFT'
<span class="views-field views-field-title"><span class="field-content">
<a href="/index.php/en/publications/communiques-presse/20201030-next-issuance-oats" hreflang="en">30 October 2020: BFT’s next issuance of long-term OATs: Thursday 05 November 2020</a>
</span></span>
rows
有人可以帮助从或可能从中提取 hreff 作为列表a
吗?谢谢。
解决方案
href = [row.find('a').get('href') for row in rows if 'AFT' in row.text]
print(href)
输出
['/index.php/en/publications/communiques-presse/20201030-next-issuance-btfs',
'/index.php/en/publications/communiques-presse/20201026-issuance-btfs',
'/index.php/en/publications/communiques-presse/20201023-next-issuance-btfs',
'/index.php/en/publications/communiques-presse/20201019-issuance-btfs',
'/index.php/en/publications/communiques-presse/20201016-next-issuance-btfs',
'/index.php/en/publications/communiques-presse/20201015-next-issuance-inflation-indexed-oats',
'/index.php/en/publications/communiques-presse/20201015-issuance-oats',
'/index.php/en/publications/communiques-presse/20201012-issuance-btfs',
'/index.php/en/publications/communiques-presse/20201009-next-issuance-indexed-oats',
'/index.php/en/publications/communiques-presse/20201009-next-issuance-btfs',
'/index.php/en/publications/communiques-presse/20201009-next-issuance-oats']
推荐阅读
- ruby-on-rails - 在 gem 更新后运行 rails s 会返回警告列表 - 警告:已初始化常量 Etc::SC_AIO_LISTIO_MAX
- ios - 如何在 Swift5 中使用逗号(,)设置字符串变量的值?
- json - Go : 如何在 json.Unmarshal 到 struct 时忽略类型不匹配错误?
- docker - Docker 入口点脚本按下标退出
- sql-server - 根据非 NULL 且早于行日期的最接近日期将值插入 NULL 行
- c++ - 字符串中的最小长度单词 (C++)
- facebook - 无法获得 Facebook API 调用的权限
- javascript - 未捕获的 RangeError:无效的数组长度 - JS 引擎错误?查看代码示例
- javascript - 从自定义 Javascript 数组中提取一个值,当它是数组中字符串值的一个组件时
- reactjs - 使用带有 Material-UI 自动完成功能的 react-hook-form 控制器的正确方法