python - Python BeautifulSoup 仅段落文本
问题描述
我对任何与网页抓取相关的东西都很陌生,据我所知,Requests 和 BeautifulSoup 是其中的一种方式。我想编写一个程序,它每隔几个小时只给我发送一个给定链接的一段(尝试一种新的方式来阅读全天的博客)说这个特定的链接' https://fs.blog/mental-models/ '每个型号都有一个段落。
from bs4 import BeautifulSoup
import re
import requests
url = 'https://fs.blog/mental-models/'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
现在汤在段落文本开始之前有一堵墙:<p> this is what I want to read </p>
soup.title.string
工作得很好,但我不知道如何从这里继续前进......有什么方向吗?
谢谢
解决方案
循环soup.findAll('p')
查找所有p
标签,然后用于.text
获取它们的文本:
此外,由于您不想要页脚段落,因此请div
在课程下执行所有操作。rte
from bs4 import BeautifulSoup
import requests
url = 'https://fs.blog/mental-models/'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
divTag = soup.find_all("div", {"class": "rte"})
for tag in divTag:
pTags = tag.find_all('p')
for tag in pTags[:-2]: # to trim the last two irrelevant looking lines
print(tag.text)
输出:
Mental models are how we understand the world. Not only do they shape what we think and how we understand but they shape the connections and opportunities that we see.
.
.
.
5. Mutually Assured Destruction
Somewhat paradoxically, the stronger two opponents become, the less likely they may be to destroy one another. This process of mutually assured destruction occurs not just in warfare, as with the development of global nuclear warheads, but also in business, as with the avoidance of destructive price wars between competitors. However, in a fat-tailed world, it is also possible that mutually assured destruction scenarios simply make destruction more severe in the event of a mistake (pushing destruction into the “tails” of the distribution).
推荐阅读
- r - 对于 r 中的循环问题:if (length[i] == 1) { 中的错误:需要 TRUE/FALSE 的缺失值
- java - 将秒除以 60 两次得到小时结果为零
- java - Spark 和 ElasticSearch 集成
- udp - gst-launch bogus 'no element "udpsrc"' 错误
- c# - C#将值添加到带有标识符号的组合框
- node.js - 为什么 Multer 在 nodeJS 上导致 errno -4058
- javascript - 你如何通过一个数组来随机生成一个不重复的随机数?
- python - 使用 np.where 时类型比较无效?
- python - 来自具有 virtuaenv 的容器的提交图像不起作用
- java - JavaFX 可以在二维形状中编写“无边界”吗?