python - BS4 getText 函数产生意外输出
问题描述
以下 html 示例会根据文本样式格式产生不同的结果 这是一行时的示例
card = """
<ul class="wrapper--inline-block float--left margin-top--15 padding-left--20 font--weight-300"><li><span class="font--weight-500">Minimum Qualification:</span> Bachelor</li><li><span class="font--weight-500">Experience Level:</span> Graduate trainee</li><li><span class="font--weight-500">Experience Length:</span> 1 year</li></ul>
"""
输出:
Minimum Qualification: BachelorExperience Level: Graduate traineeExperience Length: 1 year
并且当 html 样本被格式化时
card = """
<ul class="wrapper--inline-block float--left margin-top--15 padding-left--20 font--weight-300">
<li><span class="font--weight-500">Minimum Qualification:</span> Bachelor</li>
<li><span class="font--weight-500">Experience Level:</span> Graduate trainee</li>
<li><span class="font--weight-500">Experience Length:</span> 1 year</li>
</ul>
"""
输出
Minimum Qualification: Bachelor
Experience Level: Graduate trainee
Experience Length: 1 year
问题是,如何使第一种情况像第二种情况一样产生所需的输出。这是我当前的代码
qualifications= BeautifulSoup(card, "html.parser")
print(qualifications.getText())
解决方案
用于separator="\n"
获得所需的输出,
qualifications.getText(separator="\n")
编辑-1:
>>> card = """
<ul class="wrapper--inline-block float--left margin-top--15 padding-left--20 font--weight-300"><li><span class="font--weight-500">Minimum Qualification:</span> Bachelor</li><li><span class="font--weight-500">Experience Level:</span> Graduate trainee</li><li><span class="font--weight-500">Experience Length:</span> 1 year</li></ul>
"""
>>> qualifications= BeautifulSoup(card, "html.parser")
>>> for li in qualifications.find_all('li'):
print(li.get_text())
Minimum Qualification: Bachelor
Experience Level: Graduate trainee
Experience Length: 1 year
推荐阅读
- node.js - DynamoDB NodeJS DocClient 扫描静默间歇性失败?
- java - 在 Spring Webflux 中使用 ByteArrayInputStream
- python - 将“datetime.date(2019, 3, 21)”转换为“2019/03/21”
- events - 删除 Meteor 中的动态模板事件
- python-3.x - 如何删除熊猫数据框列中包含连字符的行?
- c++ - 如何将范围::actions::insert 与 std::vector 一起使用
- laravel - 为什么 old() 方法在 Laravel Blade 中不起作用?
- pine-script - 如何防止文本覆盖
- avr - AVR 倒数计时器
- azure-machine-learning-studio - Azure ML Studio 错误 0035:词汇表的功能为空