首页 > 解决方案 > 如何从html标签之间提取文本?

问题描述

我有一些html要从中提取文本的元素。所以html就像

<pre>
<span class="ansi-red-fg">ZeroDivisionError</span>Traceback (most recent call last)
<span class="ansi-green-fg">&lt;ipython-input-2-0f9f90da76dc&gt;</span> in <span class="ansi-cyan-fg">&lt;module&gt;</span><span class="ansi-blue-fg">()</span>

</pre>

我想将文本提取为

ZeroDivisionErrorTraceback (most recent call last)
<ipython-input-2-0f9f90da76dc> in<module>()

我在这里找到了该问题的答案,但它对我不起作用。完整的示例代码

from bs4 import BeautifulSoup as BSHTML

bs = BSHTML("""<pre>
<span class="ansi-red-fg">ZeroDivisionError</span>Traceback (most recent call last)
<span class="ansi-green-fg">&lt;ipython-input-2-0f9f90da76dc&gt;</span> in <span class="ansi-cyan-fg">&lt;module&gt;</span><span class="ansi-blue-fg">()</span>
</pre>""")
print bs.font.contents[0].strip()

我收到以下错误:

Traceback (most recent call last):
  File "invest.py", line 13, in <module>
    print bs.font.contents[0].strip()
AttributeError: 'NoneType' object has no attribute 'contents'

有什么我想念的吗?版本beautifulsoap:4.6.0

标签: pythonhtmlbeautifulsoup

解决方案


Do you want all the text content of that pre block?

print bs.pre.text

Returns:

ZeroDivisionErrorTraceback (most recent call last)
<ipython-input-2-0f9f90da76dc> in <module>()

推荐阅读