首页 > 解决方案 > 如何
在美丽的汤中提取文本直到标记

问题描述

我想从 div 中提取直到<br>标签。这个怎么做,

例如,

<div class="content-cell mdl-cell mdl-cell--6-col mdl-typography--body-1">Watched a video that has been removed<br>Aug 17, 2018, 2:34:28 PM UTC</div>

这个我用过

print  content.text

它输出,

Watched a video that has been removedAug 17, 2018, 2:34:28 PM UTC

但预期的输出是,观看了一个已被删除的视频

之后我不想发短信<br>

<br>此外,我可以尝试这个之后专门得到,

content.find('br').text

现在我想像下面那样做

result= (content.find('br').text).replace((content.find('br').text),'')

有没有其他更好的方法来避免使用beautifulsoup 的额外字符串替换方法?

标签: pythonbeautifulsouphtml-parsing

解决方案


from bs4 import BeautifulSoup

html="""<div class="content-cell mdl-cell mdl-cell--6-col mdl-typography--body-1">Watched a video that has been removed<br>Aug 17, 2018, 2:34:28 PM UTC</div>"""
soup = BeautifulSoup(html)
print(soup.find("div").contents[0])

输出应该是:

Watched a video that has been removed

推荐阅读