首页 > 解决方案 > 使用 BeautifulSoup 删除元素中的选定标签

问题描述

在一个页面中,我们有几个 h1。在第一个 h1 中,我想删除带有 class 的标签read-time。这是我的尝试。但是,标签并未被删除。我哪里错了?

h1s = main.select('h1')

print("BEFORE: main.select('h1')", main.select('h1'))

real_h1 = h1s[0]

if real_h1.select('.read-time') is not None:
    real_h1.select('.read-time').clear()

print("AFTER: main.select('h1')", main.select('h1'))

日志

BEFORE: main.select('h1') [<h1>Introduction<span class="read-time"><span class="minutes"></span> min read</span></h1>, <h1 id="before-you-begin">Before You Begin</h1>]
AFTER: main.select('h1') [<h1>Introduction<span class="read-time"><span class="minutes"></span> min read</span></h1>, <h1 id="before-you-begin">Before You Begin</h1>]

标签: pythonpython-3.xbeautifulsoup

解决方案


使用 decompose() 删除。

html='''<h1>Introduction<span class="read-time"><span class="minutes"></span> min read</span></h1>, <h1 id="before-you-begin">Before You Begin</h1>]'''
main=BeautifulSoup(html,'html.parser')
h1s = main.select('h1')

print("BEFORE: main.select('h1')", main.select('h1'))

real_h1 = h1s[0]

if real_h1.select('.read-time') is not None:
    real_h1.decompose()

print("AFTER: main.select('h1')", main.select('h1'))

输出:

BEFORE: main.select('h1') [<h1>Introduction<span class="read-time"><span class="minutes"></span> min read</span></h1>, <h1 id="before-you-begin">Before You Begin</h1>]
AFTER: main.select('h1') [<h1 id="before-you-begin">Before You Begin</h1>]

推荐阅读