python - 从 beautifulsoup lxml 文件中提取文本
问题描述
如何从 .xml 开始提取此 lxml 中的文本div class="ember-view" id="ember760">
。请帮忙。我尝试了以下代码,但未捕获文本。
我试过的代码
#soup is an beautifulsoup element
exp = soup.find('header', {'class': 'pv-profile-section__card-header'})
exp
lxml 文件
<div class="pv-recommendation-entity__highlights">
<blockquote class="pv-recommendation-entity__text relative">
<div class="ember-view" id="ember760"> <span class="lt-line-clamp__line">I know Abc from Data Analysis training sessions with abc,</span>
<span class="lt-line-clamp__line">Abc
is an enthusiastic candidature in training sessions. He is an</span>
<span class="lt-line-clamp__line">extremely capable and dedicated entry-level Data Science Analyst.</span>
<span class="lt-line-clamp__line">He is enhancing Analytics skills by his enthusiasm for learning new</span>
<span class="lt-line-clamp__line lt-line-clamp__line--last">
things, and has learnt new tools like R, SPSS, and Pytho<span class="lt-line-clamp__ellipsis">...
<a aria-expanded="false" class="lt-line-clamp__more" data-test-line-clamp-show-more-button="true" href="#" id="line-clamp-show-more-button" role="button">See more</a>
</span></span>
<!-- --><span class="lt-line-clamp__ellipsis lt-line-clamp__ellipsis--dummy">... <a class="lt-line-clamp__more" href="#" role="button">See more</a></span></div>
</blockquote>
</div>
</li>
</ul>
<!-- --></div>
</div></div>
预期产出
I know Abc from Data Analysis training sessions with abc,
is an enthusiastic candidature in training sessions. He is an
extremely capable and dedicated entry-level Data Science Analyst.
He is enhancing Analytics skills by his enthusiasm for learning new
things, and has learnt new tools like R, SPSS, and Pytho
解决方案
您可以使用 CSS 选择器div#ember760
来选择<div class="ember-view" id="ember760">
和.get_text()
方法:
from bs4 import BeautifulSoup
txt = '''
<div class="pv-recommendation-entity__highlights">
<blockquote class="pv-recommendation-entity__text relative">
<div class="ember-view" id="ember760"> <span class="lt-line-clamp__line">I know Abc from Data Analysis training sessions with abc,</span>
<span class="lt-line-clamp__line">Abc
is an enthusiastic candidature in training sessions. He is an</span>
<span class="lt-line-clamp__line">extremely capable and dedicated entry-level Data Science Analyst.</span>
<span class="lt-line-clamp__line">He is enhancing Analytics skills by his enthusiasm for learning new</span>
<span class="lt-line-clamp__line lt-line-clamp__line--last">
things, and has learnt new tools like R, SPSS, and Pytho<span class="lt-line-clamp__ellipsis">...
<a aria-expanded="false" class="lt-line-clamp__more" data-test-line-clamp-show-more-button="true" href="#" id="line-clamp-show-more-button" role="button">See more</a>
</span></span>
<!-- --><span class="lt-line-clamp__ellipsis lt-line-clamp__ellipsis--dummy">... <a class="lt-line-clamp__more" href="#" role="button">See more</a></span></div>
</blockquote>
</div>
</li>
</ul>
<!-- --></div>
</div></div>'''
soup = BeautifulSoup(txt, 'lxml')
print(soup.select_one('div#ember760').get_text(strip=True, separator='\n'))
印刷:
I know Abc from Data Analysis training sessions with abc,
Abc
is an enthusiastic candidature in training sessions. He is an
extremely capable and dedicated entry-level Data Science Analyst.
He is enhancing Analytics skills by his enthusiasm for learning new
things, and has learnt new tools like R, SPSS, and Pytho
...
See more
...
See more
推荐阅读
- python - 替换 numpy 数组中的值时防止字符串被截断
- python - 带有 Scrapy 和 Python 的 XPath,无法使 XPath 正常工作
- ffmpeg - ffmpeg concat 视频和图像问题
- c# - 在 Unity 中旋转 2D 世界
- flowtype - Flowtype:如何创建类型保护功能?
- javascript - 使用 create-react-app 创建的环境变量初始化 Redux 应用程序
- apache-spark - 用于将数据集下载/流式传输给用户的集群设计
- python - 遍历已经有索引的变量并写入 JSON
- java - 未从 application.properties | 获取属性 使用自定义浅健康检查代替弹簧健康检查 |
- amazon-dynamodb - 我需要排序键还是应该使用 AWS DAX