首页 > 解决方案 > 为什么 beatifulsoup 返回空列表

问题描述

在我的代码中,一切正常,但是当我尝试获取产品的价格时,它一直返回空列表,我尝试了 soup.select、find 和 findAll,但都返回 None 或空列表。

价格选择器:'#product-price > div > span:nth-child(2) > span.current-price-container > span.current-price'

进入网站https://www.asos.com/search/?q=jordan后尝试在控制台中粘贴选择器

控制台会输出价格,但我的代码不会。

检查第 36 行

import requests
from bs4 import beautifulsoup

class Scraper:

    def __init__(self):  
        self.headers = {
            "Accept": "*/*",
            "Accept-Encoding": "gzip, deflate",
            "Accept-Language": "en-US",
            "User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 13_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 Instagram 123.1.0.26.115 (iPhone11,8; iOS 13_3; en_US; en-US; scale=2.00; 828x1792; 190542906)",
            "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
            "X-IG-Capabilities": "3brTvw==",
            "X-IG-Connection-Type": "WIFI",
        }
        self.KEYWORDS = [
            "jordan", "jordan 1", "air jordan", "jordan 3", "jordan 4", "jordan 5", "jordan 6", "dunk", "sb dunk",
            "dunk high", "dunk low",
            "air force", "air force 1", "blazer", "Yeezy", "Travis", "Travis Scott", "Off White", "jordan 1 low",
            "jordan low", "Peso", "virgil", "kanye",
            "kanye west", "powerphase", "university", "grape", "varsity", "jordan 1 mid", "jordan mid", "light grey",
            "shattered", "chicago", "tie dye",
            "dunk low", "air max 90", "air max", "Fear of god", "fog", "supreme", "bape", "off-white"
        ]
    def GetPage(self):
        
        self.request = requests.get('https://www.asos.com/search/?q=jordan', headers=self.headers)
        self.soup = BeautifulSoup(self.request.text, 'html.parser')
        self.GetProductLinks()

    def ScrapProduct(self):
        for link in self.ProductsLinks:
            page = requests.get(link, headers=self.headers)
            self.soup = BeautifulSoup(page.text, 'html.parser')
            self.PRODUCT_NAME = self.soup.select('#aside-content > div.product-hero > h1')
            self.PRODUCT_PRICE = self.soup.select('#product-price > div > span:nth-child(2) > span.current-price-container > span.current-price') #MY PROBLEM IS HERE
            # self.PRODUCT_COLOR = self.soup.select('#product-colour > section > div > div > span')

            print(self.PRODUCT_NAME, self.PRODUCT_PRICE)
        

    def GetProductLinks(self):
        self.FindProducts = self.soup.select('#plp > div > div._3-pwX1m > div > div._3pQmLlY > section > article > a')
        self.ProductsLinks = []
        for product in self.FindProducts:
            self.ProductsLinks.append(product['href'])
        self.ScrapProduct()


Scraper = Scraper()
Scraper.GetPage()

标签: pythonweb-scrapingbeautifulsoup

解决方案


以下代码有效。不要依赖不可读的 CSS 选择器,而是依赖于名称属性。必须修改以下代码:

import requests as rq
from bs4 import BeautifulSoup as bs

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/86.0"}
url = "https://www.asos.com/search/?q=jordan"
resp = rq.get(url, headers=headers)

soup = bs(resp.content)
articles = list(soup.find_all("article", attrs={"data-auto-id":True}))

descriptions = [article.find("div", attrs={"data-auto-id":True}).text for article in articles]
prices = [article.find("span", attrs={"data-auto-id":True}).text for article in articles]

for el in zip(descriptions, prices):
    print(el)

# renvoi
#    ('Nike Jordan Jumpman t-shirt in white', '£19.95')
#    ('Nike Jordan Jumpman shorts in black', '£31.95')
#    ('Nike Jordan Jumpman large logo t-shirt in grey', '£24.95')
#    ....

有时要小心,需要加载多个页面考虑在您的网址中添加“页面”,如下所示: https ://www.asos.com/search/?page=2&q=jordan


推荐阅读