首页 > 解决方案 > 如何从 beautifulsoup 页面获取所有产品

问题描述

我想获取此页面上的所有产品:

nike.com.br/snkrs#estoque

我的python代码是这样的:

produtos = []
def aviso():                               
 print("Started!")                                
 request = requests.get("https://www.nike.com.br/snkrs#estoque")
 soup = bs4(request.text, "html.parser")
 links = soup.find_all("a", class_="btn", text="Comprar")
 links_filtred = list(set(links))
 for link in links_filtred:
  if(produto not in produtos):
   request = requests.get(f"{link['href']}")    
   soup = bs4(request.text, "html.parser")     
   produto = soup.find("div", class_="nome-preco-produto").get_text()
   if(code_formated == ""):                         
    code_formated = "\u200b"
   print(f"Nome: {produto} Link: {link['href']}\n")                                                     
   produtos.append(link["href"])
aviso()

伙计们,这段代码从页面中获取产品,但不是昨天的全部,我怀疑内容是动态的,但是我怎样才能通过 request 和 beautifulsoup 获取它们?我不想使用 Selenium 或自动化库,我该怎么做?我不想更改我的代码,因为它快完成了,我该怎么做?

标签: pythonpython-3.xpython-2.7beautifulsoup

解决方案


要获取数据,您可以向以下地址发送请求:

https://www.nike.com.br/Snkrs/Estoque?p=<PAGE>&demanda=true

p=在 URL中提供 1-5 之间的页码。

例如,要打印链接,您可以尝试:

import requests
from bs4 import BeautifulSoup


url = "https://www.nike.com.br/Snkrs/Estoque?p={page}&demanda=true"

for page in range(1, 6):
    response = requests.get(url.format(page=page))
    soup = BeautifulSoup(response.content, "html.parser")
    print(soup.find_all("a", class_="btn", text="Comprar"))

推荐阅读