首页 > 解决方案 > 如何在 BeautifulSoup 中删除以前的兄弟姐妹

问题描述

我正在尝试从标签顶部和<hr />标签下方的下一个兄弟姐妹中删除以前的兄弟姐妹</h2>,问题是我收到此错误AttributeError: 'NavigableString' object has no attribute 'decompose'

我试图解析的 HTML 是这样的

<h1>Heading text</h1>

<p style="text-align: justify;">this and everything untop i want to delete</p>
<hr />
<p style="margin: 0px; font-size: 12px; font-family: Helvetica;"> this and text below i want to keep</p>

<p style="margin: 0px; font-size: 12px; font-family: Helvetica;"> text tex text</p>

<h2>Heading 2</h2>

<p> this and everything below i want to remove</p>

像上面给出的那样提供 html 不会给出删除兄弟姐妹的结果,只会返回 AttributeError。我做错了什么,我该如何解决这个问题?

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')

for prev_sibling in soup.find("hr").previous_siblings:
    prev_sibling.decompose()

for next_sibling in soup.find("h2").next_siblings:
    prev_sibling.decompose()

标签: pythonpython-3.xbeautifulsoup

解决方案


使用find_previous_siblings() 和find_next_siblings()

from bs4 import BeautifulSoup
html='''<h1>Heading text</h1>
<p style="text-align: justify;">this and everything untop i want to delete</p>
<hr />
<p style="margin: 0px; font-size: 12px; font-family: Helvetica;"> this and text below i want to keep</p>
<p style="margin: 0px; font-size: 12px; font-family: Helvetica;"> text tex text</p>
<h2>Heading 2</h2>
<p> this and everything below i want to remove</p>'''

soup = BeautifulSoup(html, 'lxml')

for prev_sibling in soup.find("hr").find_previous_siblings():
    prev_sibling.decompose()

for next_sibling in soup.find("h2").find_next_siblings():
    next_sibling.decompose()

print(soup)

推荐阅读