python - 如何获取具有相同类的文章标签的p标签的第二个文本?
问题描述
我正在使用 beautifulsoup 从网站上抓取数据。
我想从页面中获取标题和摘要并存储它。我能够获取标题,但是我在提取摘要时遇到了问题,因为摘要具有与标题的文章标签相同的 css 类。
到目前为止我已经尝试过:
tempURL = 'http://www.globalbigdataconference.com/santa-clara/global-artificial-intelligence-virtual-conference-125/speaker-details/aaron-burciaga-114059.html'
page = requests.get(tempURL)
soup = BeautifulSoup(page.content, 'lxml')
Tag = soup.find_all('h4', class_ = 'clearfix Roboto-Medium font13 sbl-t t-b-m0')
Value = soup.find_all('h4', class_ = 'clearfix Roboto-Medium font15 sbl-t t-b-m0 dks-t l-h20')
topic = soup.find( 'article', class_ = 'clearfix font14 dkg-t Roboto-Regular t-p15 l-h26')
#print('Topic:' + topic.text)
abstract = soup.select('article > p')[2].get_text()
print(abstract)
我在这里面临的问题是 ('article > p')[2] 继续阅读完整页面我希望它只阅读摘要
.
解决方案
您可以使用.contents
来获取所需的文本。
一个标签的孩子在一个名为
.contents
- Docs的列表中可用
这是如何做到的。
import requests
from bs4 import BeautifulSoup
url = 'http://www.globalbigdataconference.com/santa-clara/global-artificial-intelligence-virtual-conference-125/speaker-details/aaron-burciaga-114059.html'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
p = soup.find_all('article', class_='clearfix font14 dkg-t Roboto-Regular t-p15 l-h26')[1]
abstract = p.find('p').contents[0]
print(abstract)
During this session, Aaron Burciaga CAP, will review the methods, critical components, emerging technology and innovative methods for designing and building artificial intelligence and maching learning systems that are “made to stick” by driving adoption and adherence principles as much with the users as with the engineers. Having developed Analytics Centers of Excellence for Fortune 100 Companies, growing and leading teams of over 400 data scientists, and being key advisor to government officials on the establishment of AI programs, Aaron will share how to deliver more “Practical AI”.
推荐阅读
- python - 大量向量上的部分明智的 Gram-Schmidt-Scheme
- android - SharedPreferences 需要隐私政策吗?
- python - 提取 Twitter 账号描述 抓取
- mysql - MySQL Joins:根据源表数据选择从哪个表加入
- docker - kubernetes 错误 -: 错误: 未知命令 "–f XXXX.yaml
- ruby-on-rails - 在视图上看不到来自控制器的变量 - Rails 6
- asp.net-core-mvc - 如何从 ASP.NE Core 中的查询字符串中获取 int 值
- python - 在数据框中查找特定列的平均最多 x 行数
- python - 无法在 Python 类中运行定义的函数
- python - 为什么在访问 Cython 二进制文件时有些 cdef 可以访问而有些则不能访问?