首页 > 解决方案 > 使用 Python 和 Beautifulsoup 抓取 ASOS 产品价格的网页

问题描述

我正在尝试从 ASOS 电子商务网站上抓取产品数据。我正在使用 python 和 Beautifulsoup。我已经能够抓取除价格以外的大部分数据。我需要有关如何定位价格的帮助。这是我从https://www.asos.com/asos-design/asos-design-lambswool-crew-neck-jumper-in-wine/prd/14801727?colourwayid=16645056&SearchQuery=&cid=7617抓取的页面。我有正在过滤的单独 html 文件上的页面 html 代码。我试过这个

product_data['price'] = soup.find('div', {"class":"grid-row rendered"}).find('span', {"class":"current-price-container"}).find('span',{"class":"current-price"}).text

但我收到错误 AttributeError: 'NoneType' object has no attribute 'find'

测试.py

from bs4 import BeautifulSoup
import html
def Ecom_Scraper():
    #get html input
    with open("temp.html", "r", encoding='utf-8') as f:
    html_content = f.read()
    html_content = html.unescape(html_content)
    soup = BeautifulSoup(html_content, "html.parser")
    product_data={}

    product_data['title'] = soup.find('div', {"class":"product-hero"}).find('h1').text
    product_data['code'] = soup.find('div', {"class":"product-code"}).find('p').text
    product_data['description'] = soup.find('div', {"class":"product- 
     description"}).find('ul').text
    product_data['brand'] = soup.find('div', {"class":"brand-description"}).find('p').text
    product_data['aboutme'] = soup.find('div', {"class":"about-me"}).find('p').text
    product_data['lookatme'] = soup.find('div', {"class":"care-info"}).find('p').text

    print(product_data)
Ecom_Scraper()

这是我的项目views.py的代码

from django.shortcuts import render, HttpResponseRedirect, redirect, 
get_object_or_404
from django.http import HttpResponse, Http404,HttpResponseRedirect
import requests
from rest_framework.response import Response
from rest_framework.views import APIView
from rest_framework import serializers

class GetProductDetail(APIView):
    def post(self, request):
    html_content = request.data.get('product_data')
    soup = BeautifulSoup(html.unescape(html_content), 'html.parser')
    product_data = {}
    product_data['title'] = soup.find('div', {"class":"product-hero"}).find('h1').text
    product_data['code'] = soup.find('div', {"class":"product-code"}).find('p').text
    product_data['description'] = soup.find('div', {"class":"product-description"}).find('ul').text
    product_data['brand'] = soup.find('div', {"class":"brand-description"}).find('p').text
    product_data['aboutme'] = soup.find('div', {"class":"about-me"}).find('p').text
    product_data['lookatme'] = soup.find('div', {"class":"care-info"}).find('p').text

    product_json = json.dumps(product_data, indent=4)
    print(product_json)
    return Response({product_json})

标签: pythonbeautifulsoup

解决方案


价格是动态加载的,因此您不会使用BeautifulSoup.

另外,您是从本地驱动器加载网站吗?如果是这样,你就犯了一个错误,因为这不是HTML从服务器请求时会得到的。

无论如何,这是获取基本产品数据及其价格的方法:

import json

import requests
from bs4 import BeautifulSoup

headers = {
    "user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"
}

product_url = "https://www.asos.com/asos-design/asos-design-lambswool-crew-neck-jumper-in-wine/prd/14801727"
page = requests.get(product_url, headers=headers)
soup = BeautifulSoup(page.text, "html.parser").find("script", type="application/ld+json")
product_data = json.loads(soup.string)

print(product_data["name"], product_data["color"], product_data["productID"])


price_endpoint = f"https://www.asos.com/api/product/catalogue/v3/stockprice?productIds={product_data['productID']}&store=COM&currency=GBP"

print(requests.get(price_endpoint, headers=headers).json()[0]["productPrice"]["xrp"]["text"])

这打印:

ASOS DESIGN lambswool crew neck jumper in wine Wine 14801727
£35.00

推荐阅读