首页 > 解决方案 > beautifulsoup:在标签内放置文本

问题描述

我正在尝试使用 beautifulsoup 从 html 文件中提取字符串。查询回复其中包含标签标签,我怎样才能摆脱这些标签。

from bs4 import BeautifulSoup
import requests

with open('/Desktop/filename.html') as html_file:
    soup = BeautifulSoup(html_file, 'lxml')

string = soup.find('div', class_="col-sm-8 col-xs-6")
print(string)

输出-

<div class="col-sm-8 col-xs-6">
    Sherlock Holmes <br>
    <label for="AgentAddress" style="display: none;">
        Detective's Address
    </label>
    221B Baker Street London <br>
    <label for="AgentCityStateZip" style="display: none;">
        City, State, Zip
    </label>
    London, United Kingdom            
</div>

print(string.text)输出

    Sherlock Holmes
    Detective's Address
    221B Baker Street London
    City, State, Zip
    London, United Kingdom 

我对<label></label>标签内的文本不感兴趣,我怎样才能摆脱它们以便输出是 -

    Sherlock Holmes
    221B Baker Street London
    London, United Kingdom 

标签: pythonhtmlbeautifulsoup

解决方案


例如,您可以在打印之前尝试使用分解:

for label_element in string.find_all("label"):
    label_element.decompose()

推荐阅读