首页 > 解决方案 > 我无法从该网站上抓取项目。Python

问题描述

我试图刮掉这个网站上的所有服装,但我做不到。我在 'find_all' 中设置了 'limit=3' 但它只给了我 1 个结果。如何在一个请求中获得所有结果?请帮帮我,我被这个困住了!

这是我要抓取的电子商务网站

def trendyol():
url = "https://www.trendyol.com/erkek+kazak--hirka?filtreler=22|175"
headers = {"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36'}

page = requests.get(url, headers=headers).text
soup = BeautifulSoup(page, "html.parser")


list= soup.find_all("div",{"class":"p-card-chldrn-cntnr"}, limit=3)
    
for div in list:
        
    link= str("https://www.trendyol.com/" + div.a.get("href"))
        
    name = div.find("span",{"class":"prdct-desc-cntnr-name hasRatings"}).text
 

print(f'link: {link}')
print(f'isim: {name}')
  

标签: pythonweb-scrapingbeautifulsoup

解决方案


试试这个代码:

from bs4 import BeautifulSoup
import requests

def trendyol(url):
    headers = {"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36'}
    page = requests.get(url, headers=headers).text
    soup = BeautifulSoup(page, "html.parser")

    list= soup.find("div", {'class':'prdct-cntnr-wrppr'})
    for link in list.find_all('div',{'class': 'p-card-chldrn-cntnr'}):
        print("https://www.trendyol.com" + link.find('a', href=True)['href'])
        print(link.find('div',{'class':'image-container'}).img['alt'])
        print(link.find('span',{'class':'prdct-desc-cntnr-ttl'}).text)

url = "https://www.trendyol.com/erkek+kazak--hirka?filtreler=22%7C175&pi=3"
trendyol(url)

此代码带有打印产品 url、标题和标题的替代文本。谢谢。


推荐阅读