首页 > 解决方案 > Webscraping bs4,无法弄清楚如何获得结果

问题描述

我目前正在尝试title = "XFX"从以下位置获取价值:

<a class="item-brand" href="https://www.newegg.com/XFX/BrandStore/ID-1669">
                <img alt="XFX" class="lazy-img" data-effect="fadeIn" data-src="//c1.neweggimages.com/Brandimage_70x28//Brand1669.gif" src="//c1.neweggimages.com/WebResource/Themes/2005/Nest/blank.gif" title="XFX">
                </img></a>

目前我正在使用这个python代码来访问它,但找不到

brand_container = container.findAll("a", {"class":"item-brand"})
    brand = brand_container[0].title

我不知道在 brand = brand_container 之后放什么才能获得title =价值

标签: pythonweb-scrapingbeautifulsoup

解决方案


标题属性在图像标签而不是标签下。您可以使用find_all或 css 选择器select

from bs4 import BeautifulSoup
html='''<a class="item-brand" href="https://www.newegg.com/XFX/BrandStore/ID-1669">
                <img alt="XFX" class="lazy-img" data-effect="fadeIn" data-src="//c1.neweggimages.com/Brandimage_70x28//Brand1669.gif" src="//c1.neweggimages.com/WebResource/Themes/2005/Nest/blank.gif" title="XFX">
                </img></a>'''

container=BeautifulSoup(html,'html.parser')
brand_container = container.find_all("a", class_="item-brand")
for brand in brand_container:
    print(brand.find_next('img')['title'])

CSS 选择器

for brand in container.select(".item-brand>img"):
    print(brand['title'])

推荐阅读