首页 > 解决方案 > python web抓取,提取标签的内部元素

问题描述

我想从在线购物网站上抓取产品和价格,需要帮助来提取标签之间的字符串

import bs4
from urllib.request import urlopen
from bs4 import BeautifulSoup as soup
my_url='https://www.flipkart.com/cameras/mirrorless~type/pr?sid=jek%2Cp31'
cl=urlopen(my_url)
page_html=cl.read()
ps=soup(page_html,'html5lib')
ps1=(ps.prettify())
cn=ps.findAll('div',{'class':'_1-2Iqu row'})
len(cn)                     
cn[0].div.div

#output-"<div class="_3wU53n">Canon M50 Mirrorless Camera Body with Single Lens EF-M 15-45 mm ISSTM</div>
#i need Canon M50 Mirrorless Camera Body with Single Lens EF-M 15-45 mm ISSTM

标签: pythonweb-scraping

解决方案


将 cn=ps.findAll('div',{'class':'_1-2Iqu row'}) 替换为 cn=ps.findAll('div',{'class':'_1-2Iqu row'},text=真的)


推荐阅读