首页 > 解决方案 > 如何获取具有相同类的文章标签的p标签的第二个文本?

问题描述

我正在使用 beautifulsoup 从网站上抓取数据。

我想从页面中获取标题和摘要并存储它。我能够获取标题,但是我在提取摘要时遇到了问题,因为摘要具有与标题的文章标签相同的 css 类。

网站:http ://www.globalbigdataconference.com/santa-clara/global-artificial-intelligence-virtual-conference-125/speaker-details/aaron-burciaga-114059.html

到目前为止我已经尝试过:

tempURL = 'http://www.globalbigdataconference.com/santa-clara/global-artificial-intelligence-virtual-conference-125/speaker-details/aaron-burciaga-114059.html'

page = requests.get(tempURL)
soup = BeautifulSoup(page.content, 'lxml')
Tag = soup.find_all('h4', class_ = 'clearfix Roboto-Medium font13 sbl-t t-b-m0')
Value = soup.find_all('h4', class_ = 'clearfix Roboto-Medium font15 sbl-t t-b-m0 dks-t l-h20')
topic = soup.find( 'article', class_ = 'clearfix font14 dkg-t Roboto-Regular t-p15 l-h26')
#print('Topic:' + topic.text)
abstract = soup.select('article > p')[2].get_text()
print(abstract)

我在这里面临的问题是 ('article > p')[2] 继续阅读完整页面我希望它只阅读摘要

.

标签: pythonbeautifulsoupgoogle-colaboratory

解决方案


您可以使用.contents来获取所需的文本。

一个标签的孩子在一个名为.contents- Docs的列表中可用

这是如何做到的。

import requests
from bs4 import BeautifulSoup

url = 'http://www.globalbigdataconference.com/santa-clara/global-artificial-intelligence-virtual-conference-125/speaker-details/aaron-burciaga-114059.html'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
p = soup.find_all('article', class_='clearfix font14 dkg-t Roboto-Regular t-p15 l-h26')[1]
abstract = p.find('p').contents[0]
print(abstract)
During this session, Aaron Burciaga CAP, will review the methods, critical components, emerging technology and innovative methods for designing and building artificial intelligence and maching learning systems that are “made to stick” by driving adoption and adherence principles as much with the users as with the engineers. Having developed Analytics Centers of Excellence for Fortune 100 Companies, growing and leading teams of over 400 data scientists, and being key advisor to government officials on the establishment of AI programs, Aaron will share how to deliver more “Practical AI”.

推荐阅读