python - 使用 BeautifulSoup 在 DIV 类中的 H 标记中查找部分文本

问题描述

我在 DIV 类内容中有一个 HTML，看起来像

<h2>
 <strong>
 Brookstone
 </strong>
 AS20194 Multi-functional Massage Chair
</h2>

我的 Python 代码是

soup.find('div',attrs={'class':'content'}).h2.text

它返回

Brookstone
                         AS20194 Multi-functional Massage Chair

我应该如何更新代码以使其返回

AS20194 Multi-functional Massage Chair

标签： pythonhtmlpython-3.xweb-scrapingbeautifulsoup

不需要做.extract()，你可以使用.find_next_sibling()with 参数text=True：

from bs4 import BeautifulSoup


txt = '''<h2>
 <strong>
 Brookstone
 </strong>
 AS20194 Multi-functional Massage Chair
</h2>'''

soup = BeautifulSoup(txt, 'html.parser')

print(soup.h2.strong.find_next_sibling(text=True))

印刷：

 AS20194 Multi-functional Massage Chair

python - 使用 BeautifulSoup 在 DIV 类中的 H 标记中查找部分文本

问题描述

解决方案

推荐阅读