首页 > 解决方案 > 如何从“data-at”中提取文本?

问题描述

我正在尝试抓取丝芙兰的网站,但我无法获得我想要的信息,我正在尝试提取每种香水的名称,我尝试了 2 种方法,使用“brand=soup.find(..... )[...]" 并将其从循环中取出,然后返回“sku_item_brand”,这不是我想要的,第二种方法是下面的代码,但我不知道为什么它在环形。我得到的错误是“NoneType”对象不可下标有人请帮忙!

from bs4 import BeautifulSoup
import requests

source = requests.get('https://www.sephora.com/shop/perfume')
soup = BeautifulSoup(source.content, 'html.parser')
perfume_containers = soup.find_all('div', class_="css-12egk0t")
# List to store the scraped data in
brands = []
for container in perfume_containers:
# The brand
  brand = container.find('span', class_='css-ktoumz')['data-at']
  brands.append(brand)

我试图从中提取的 HTML 代码

标签: pythonhtml

解决方案


brand.text 将返回文本值

for container in perfume_containers:
    brand = container.find('span', class_='css-ktoumz')
    try:
        brands.append(brand.text)
    except AttributeError:
        continue

print(brands)

输出

['CHANEL', 'Viktor&Rolf', 'CHANEL', 'Juliette Has a Gun', 'TOM FORD', 'CHANEL', 'Yves Saint Laurent', 'Versace', 'Yves Saint Laurent', 'Chloé', 'Sephora Favorites', 'Valentino']

推荐阅读