python - 为什么内部标签中的文本被忽略，如何解决？

问题描述

<p>The latest media Tweets from Yohir Akerman (@yohirakerman). My bio changes all the time. /// akermancolumnista<strong>@gmail.com</strong>. Airplane</p>

我尝试提取整个文本如下：

    body = response.xpath('//*[@id="b_results"]/p/text()").getall()
    print(body)

我得到的输出是：

['The latest media Tweets from Yohir Akerman (@yohirakerman). My bio changes '
 'all the time. /// akermancolumnista',
 '. Airplane']

标签内的整个文本都<strong>被忽略了，如何解决？

标签： pythonhtmlbeautifulsoup

不要使用text() . 里面

body = response.xpath('//*[@id="b_results"]/p").getall()
    print(body)

然后加入正文并清理所有标签的正文。

python - 为什么内部标签中的文本被忽略，如何解决？

问题描述

解决方案

推荐阅读