python - 为什么 beatifulsoup 返回空列表
问题描述
在我的代码中,一切正常,但是当我尝试获取产品的价格时,它一直返回空列表,我尝试了 soup.select、find 和 findAll,但都返回 None 或空列表。
价格选择器:'#product-price > div > span:nth-child(2) > span.current-price-container > span.current-price'
进入网站https://www.asos.com/search/?q=jordan后尝试在控制台中粘贴选择器
控制台会输出价格,但我的代码不会。
检查第 36 行
import requests
from bs4 import beautifulsoup
class Scraper:
def __init__(self):
self.headers = {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "en-US",
"User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 13_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 Instagram 123.1.0.26.115 (iPhone11,8; iOS 13_3; en_US; en-US; scale=2.00; 828x1792; 190542906)",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"X-IG-Capabilities": "3brTvw==",
"X-IG-Connection-Type": "WIFI",
}
self.KEYWORDS = [
"jordan", "jordan 1", "air jordan", "jordan 3", "jordan 4", "jordan 5", "jordan 6", "dunk", "sb dunk",
"dunk high", "dunk low",
"air force", "air force 1", "blazer", "Yeezy", "Travis", "Travis Scott", "Off White", "jordan 1 low",
"jordan low", "Peso", "virgil", "kanye",
"kanye west", "powerphase", "university", "grape", "varsity", "jordan 1 mid", "jordan mid", "light grey",
"shattered", "chicago", "tie dye",
"dunk low", "air max 90", "air max", "Fear of god", "fog", "supreme", "bape", "off-white"
]
def GetPage(self):
self.request = requests.get('https://www.asos.com/search/?q=jordan', headers=self.headers)
self.soup = BeautifulSoup(self.request.text, 'html.parser')
self.GetProductLinks()
def ScrapProduct(self):
for link in self.ProductsLinks:
page = requests.get(link, headers=self.headers)
self.soup = BeautifulSoup(page.text, 'html.parser')
self.PRODUCT_NAME = self.soup.select('#aside-content > div.product-hero > h1')
self.PRODUCT_PRICE = self.soup.select('#product-price > div > span:nth-child(2) > span.current-price-container > span.current-price') #MY PROBLEM IS HERE
# self.PRODUCT_COLOR = self.soup.select('#product-colour > section > div > div > span')
print(self.PRODUCT_NAME, self.PRODUCT_PRICE)
def GetProductLinks(self):
self.FindProducts = self.soup.select('#plp > div > div._3-pwX1m > div > div._3pQmLlY > section > article > a')
self.ProductsLinks = []
for product in self.FindProducts:
self.ProductsLinks.append(product['href'])
self.ScrapProduct()
Scraper = Scraper()
Scraper.GetPage()
解决方案
以下代码有效。不要依赖不可读的 CSS 选择器,而是依赖于名称属性。必须修改以下代码:
import requests as rq
from bs4 import BeautifulSoup as bs
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/86.0"}
url = "https://www.asos.com/search/?q=jordan"
resp = rq.get(url, headers=headers)
soup = bs(resp.content)
articles = list(soup.find_all("article", attrs={"data-auto-id":True}))
descriptions = [article.find("div", attrs={"data-auto-id":True}).text for article in articles]
prices = [article.find("span", attrs={"data-auto-id":True}).text for article in articles]
for el in zip(descriptions, prices):
print(el)
# renvoi
# ('Nike Jordan Jumpman t-shirt in white', '£19.95')
# ('Nike Jordan Jumpman shorts in black', '£31.95')
# ('Nike Jordan Jumpman large logo t-shirt in grey', '£24.95')
# ....
有时要小心,需要加载多个页面考虑在您的网址中添加“页面”,如下所示: https ://www.asos.com/search/?page=2&q=jordan
推荐阅读
- android - 您如何使用带数字的点指示器创建视图寻呼机?
- gcc - 令人困惑的内联汇编
- c# - 如何通过 COM 将 SAFEARRAY 从 C++ 传递到 C#
- jsf - JSF 2.x 将复杂视图简化为单个标记
- reactjs - 使子组件重新渲染而不发送无用的道具
- java - Java HttpSessionListener:如何自定义不同类型的会话?
- python - Python - 网页抓取 - 分页卡在第 1 页,没有进一步进展
- azure - 我们可以根据连接数自动扩展无状态服务吗
- c - 在 C 中保存数据并在 Python 中加载
- asp.net - 在 asp.net core 中使用soap web服务