web-scraping - 用美丽的汤从 aria-label 获得评分
问题描述
我有一个汤对象,例如:
r = requests.get('https://www.yelp.com/biz/panera-bread-markham')
soup = BeautifulSoup(r.text, 'html.parser')
我正在尝试从以下代码中找到评级,
rating_list = soup.find_all('span', {"class":"lemon--span__373c0__3997G display--inline__373c0__1DbOG border-color--default__373c0__2oFDT"})
rating_list
输出是这样的列表,
[<span class="lemon--span__373c0__3997G display--inline__373c0__1DbOG border-color--default__373c0__2oFDT"><div aria-label="3 star rating" class="lemon--div__373c0__1mboc i-stars__373c0__Y2F3O i-stars--large-3__373c0__2oM4P border-color--default__373c0__2oFDT overflow--hidden__373c0__8Jq2I" role="img"><img alt="" class="lemon--img__373c0__3GQUb offscreen__373c0__1KofL" height="560" src="https://s3-media0.fl.yelpcdn.com/assets/public/stars.yelp_design_web.yji-9bec2045845c24d3bff3ddb582884eda.png" width="132"/></div></span>,
<span class="lemon--span__373c0__3997G display--inline__373c0__1DbOG border-color--default__373c0__2oFDT"><div aria-label="4 star rating" class="lemon--div__373c0__1mboc i-stars__373c0__Y2F3O i-stars--regular-4__373c0__3acau border-color--default__373c0__2oFDT overflow--hidden__373c0__8Jq2I" role="img"><img alt="" class="lemon--img__373c0__3GQUb offscreen__373c0__1KofL" height="560" src="https://s3-media0.fl.yelpcdn.com/assets/public/stars.yelp_design_web.yji-9bec2045845c24d3bff3ddb582884eda.png" width="132"/></div></span>,
<span class="lemon--span__373c0__3997G display--inline__373c0__1DbOG border-color--default__373c0__2oFDT"><div aria-label="5 star rating" class="lemon--div__373c0__1mboc i-stars__373c0__Y2F3O i-stars--regular-5__373c0__ySHIl border-color--default__373c0__2oFDT overflow--hidden__373c0__8Jq2I" role="img"><img alt="" class="lemon--img__373c0__3GQUb offscreen__373c0__1KofL" height="560" src="https://s3-media0.fl.yelpcdn.com/assets/public/stars.yelp_design_web.yji-9bec2045845c24d3bff3ddb582884eda.png" width="132"/></div></span>,
<span class="lemon--span__373c0__3997G display--inline__373c0__1DbOG border-color--default__373c0__2oFDT"><div aria-label="3 star rating" class="lemon--div__373c0__1mboc i-stars__373c0__Y2F3O i-stars--regular-3__373c0__1DXMK border-color--default__373c0__2oFDT overflow--hidden__373c0__8Jq2I" role="img"><img alt="" class="lemon--img__373c0__3GQUb offscreen__373c0__1KofL" height="560" src="https://s3-media0.fl.yelpcdn.com/assets/public/stars.yelp_design_web.yji-9bec2045845c24d3bff3ddb582884eda.png" width="132"/></div></span>,
<span class="lemon--span__373c0__3997G display--inline__373c0__1DbOG border-color--default__373c0__2oFDT"><p class="lemon--p__373c0__3Qnnj text__373c0__2pB8f text-color--mid__373c0__3G312 text-align--left__373c0__2pnx_ text-size--small__373c0__3SGMi"><span aria-hidden="true" class="lemon--span__373c0__3997G icon__373c0__ehCWV icon--18-check-in" style="width:18px;height:18px;fill:#0077bc"><svg class="icon_svg" height="18" viewbox="0 0 18 18" width="18" xmlns="http://www.w3.org/2000/svg"><path d="M18 9l-2.136-1.84.932-2.66-2.772-.525-.524-2.77-2.66.93L8.997 0 7.163 2.136 4.5 1.206l-.525 2.77-2.77.524.932 2.66L0 9l2.137 1.84-.932 2.66 2.77.525.526 2.77 2.664-.932L8.998 18l1.84-2.137 2.662.932.524-2.77 2.772-.524-.932-2.66L18 9zm-9.85 3.23L5.324 9.4l1.13-1.13 1.698 1.696 3.396-3.395 1.13 1.134-4.525 4.525z"></path></svg></span> <!-- -->1 check-in</p></span>,
<span class="lemon--span__373c0__3997G display--inline__373c0__1DbOG border-color--default__373c0__2oFDT"><div aria-label="1 star rating" class="lemon--div__373c0__1mboc i-stars__373c0__Y2F3O i-stars--regular-1__373c0__14nrQ border-color--default__373c0__2oFDT overflow--hidden__373c0__8Jq2I" role="img"><img alt="" class="lemon--img__373c0__3GQUb offscreen__373c0__1KofL" height="560" src="https://s3-media0.fl.yelpcdn.com/assets/public/stars.yelp_design_web.yji-9bec2045845c24d3bff3ddb582884eda.png" width="132"/></div></span>,
<span class="lemon--span__373c0__3997G display--inline__373c0__1DbOG border-color--default__373c0__2oFDT"><p class="lemon--p__373c0__3Qnnj text__373c0__2pB8f text-color--mid__373c0__3G312 text-align--left__373c0__2pnx_ text-size--small__373c0__3SGMi"><span aria-hidden="true" class="lemon--span__373c0__3997G icon__373c0__ehCWV icon--18-check-in" style="width:18px;height:18px;fill:#0077bc"><svg class="icon_svg" height="18" viewbox="0 0 18 18" width="18" xmlns="http://www.w3.org/2000/svg"><path d="M18 9l-2.136-1.84.932-2.66-2.772-.525-.524-2.77-2.66.93L8.997 0 7.163 2.136 4.5 1.206l-.525 2.77-2.77.524.932 2.66L0 9l2.137 1.84-.932 2.66 2.77.525.526 2.77 2.664-.932L8.998 18l1.84-2.137 2.662.932.524-2.77 2.772-.524-.932-2.66L18 9zm-9.85 3.23L5.324 9.4l1.13-1.13 1.698 1.696 3.396-3.395 1.13 1.134-4.525 4.525z"></path></svg></span> <!-- -->1 check-in</p></span>,
<span class="lemon--span__373c0__3997G display--inline__373c0__1DbOG border-color--default__373c0__2oFDT"><div aria-label="1 star .....
.
.
.
关于从中获得评级的任何建议<div aria-label="3 star rating"
?
解决方案
实际上有很多方法,通过加载JSON
fromscript
标签,或者找到分配的 div。但我认为以下方式很清楚:)
import requests
from bs4 import BeautifulSoup
def main(url):
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
target = soup.findAll("meta", itemprop="author")
for tar in target:
print(tar['content'], tar.findNext("meta")['content'])
main("https://www.yelp.com/biz/panera-bread-markham")
输出:
Shia L. 4.0
Ryan L. 5.0
Chi K. 3.0
Joan T. 1.0
Nicky D S. 4.0
Matthew K. 3.0
Michelle W. 1.0
Jennifer C. 4.0
Niral P. 3.0
Shajitha R. 1.0
Veronica C. 3.0
Tanveer K. 1.0
Joey J. 2.0
Broadwaygirl M. 1.0
Sheena Y. 3.0
Wendy B. 4.0
Jacqueline L. 2.0
Mi S. 3.0
Sharon M. 2.0
Eduni C. 1.0
推荐阅读
- c# - 有没有办法对使用 svcutil 生成的客户端代理进行版本控制?
- javascript - 如何忽略异步函数中不可用的 API 数据?
- github - GitHub 上的 SemVer 和 0.xy 版本
- c# - 创建从排序数组开始的排序顺序子数组的数组
- jenkins - Jenkins - 当某些测试失败时构建总是不稳定的,即使它必须失败
- sql - 数据库表是否应该尽可能接近地反映 API 响应?
- r - 多级模型的模型公式,其中不同的预测变量集用于随机截距和随机斜率
- fastify - 如何在 fastify 中下载 excel 文件?
- python - 如何让“train_test_split”与数据框一起工作?
- android - Android - PagingDataAdapter 折叠/展开