python - Beautifulsoup:全部替换具有 aria 级别的属性同级标签
问题描述
我有一个 HTML 源代码,其中<div>
元素用作标题。使用 Beautifulsoup 和属性aria-level
我想用相同级别的标签替换所有<div>
元素。<h>
我的代码类型适用于我的目的,但它似乎不优雅,理想情况下,前<div>
元素的属性将被删除。
import bs4
html = '''<div id="container">
<div role="heading" aria-level="1">The main page heading</div>
<p>This article is about showing a page structure.</p>
<div role="heading" aria-level="2">Introduction</div>
<p>An introductory text.</p>
<div role="heading" aria-level="2">Chapter 1</div>
<p>Text</p>
<div role="heading" aria-level="3">Chapter 1.1</div>
<p>More text in a sub section.</p>
</div>'''
soup = bs4.BeautifulSoup(html, "html.parser")
for divheader in soup.find_all("div", {"aria-level": "1"}):
divheader.name = "h1"
for divheader in soup.find_all("div", {"aria-level": "2"}):
divheader.name = "h2"
for divheader in soup.find_all("div", {"aria-level": "3"}):
divheader.name = "h3"
print(soup)
输出:
<div id="container">
<h1 aria-level="1" role="heading">The main page heading</h1>
<p>This article is about showing a page structure.</p>
<h2 aria-level="2" role="heading">Introduction</h2>
<p>An introductory text.</p>
<h2 aria-level="2" role="heading">Chapter 1</h2>
<p>Text</p>
<h3 aria-level="3" role="heading">Chapter 1.1</h3>
<p>More text in a sub section.</p>
</div>
它应该是什么样子:
<div id="container">
<h1>The main page heading</h1>
<p>This article is about showing a page structure.</p>
<h2>Introduction</h2>
<p>An introductory text.</p>
<h2>Chapter 1</h2>
<p>Text</p>
<h3>Chapter 1.1</h3>
<p>More text in a sub section.</p>
</div>
解决方案
您可以使用del.attrs
从标签中删除所有属性:
for div in soup.select("div[aria-level]"):
div.name = f'h{div["aria-level"]}'
del div.attrs
print(soup)
印刷:
<div id="container">
<h1>The main page heading</h1>
<p>This article is about showing a page structure.</p>
<h2>Introduction</h2>
<p>An introductory text.</p>
<h2>Chapter 1</h2>
<p>Text</p>
<h3>Chapter 1.1</h3>
<p>More text in a sub section.</p>
</div>
推荐阅读
- java - 我的 Java 模糊逻辑项目不起作用
- java - More than one parameter inside equalsIgnoreCase() method
- internet-explorer-11 - 使用 browserslist 和 babel 支持 IE 11
- apache2 - 更改 DocumentRoot 后 403 Forbidden
- eclipse - 如何在eclipse中获取外部工具列表
- javascript - jQuery autocomplete() submits form, but no values have been sent
- octopus-deploy - Run step only from certain version onward in Octopus deploy
- twitter-bootstrap - Boostrap navbar default code not behaving correctly
- html - CSS Flex menu with submenu direction and line break
- python - How to select specific text from the output?