首页 > 解决方案 > bs4 - 如何只找到带有引号的段落?

问题描述

我试图弄清楚如何在网站(https://www.shortlist.com/news/most-ridiculous-trump-quotes-ever)中找到所有段落,并且只存储带有引号的段落。我正在使用 bs4 解析数据和请求以获取网站内容。我不知道如何只存储其中有引号的段落。任何帮助或指导将不胜感激。最后,我基本上想将所有引号存储在一个文本文件中。

标签: pythonpython-3.xparsingbeautifulsouppython-requests

解决方案


最简单的解决方案是只选择开头的标题p后的第一段(这样你就可以跳过随机的)。h3"On"h3

{p.text: p.find_next("p").text for p in soup.select("div > h3") if p.text.startswith("On")}

样本输出:

{'On domestic policy': '"I think if this country gets any kinder or gentler, it\'s literally going to cease to exist."\r\n',
 'On immigration': '“Why are we having all these people from shithole countries coming here?”\r\n',
 'On Syrian refugees': '"What I won\'t do is take in two hundred thousand Syrians who could be ISIS... I have been watching this migration. And I see the people. I mean, they\'re men. They\'re mostly men, and they\'re strong men. These are physically young, strong men. They look like prime-time soldiers. Now it\'s probably not true, but where are the women?... So, you ask two things. Number one, why aren\'t they fighting for their country? And number two, I don\'t want these people coming over here."\r\n',
 'On border control': '"I will build a great, great wall on our southern border, and I will have Mexico pay for that wall. Mark my words."\r\n',

推荐阅读