首页 > 解决方案 > 此代码应返回产品标题,但我得到的不是标题,而是“无”作为回报

问题描述

我正在尝试通过查看 youtube 教程为亚马逊制作价格跟踪器,我是 python 和网络抓取的新手,不知何故我写了这段代码,它应该返回产品名称,但它给了我“无”作为输出,可以请你帮我解决这个问题?

我尝试使用不同的 URL 仍然无法正常工作。

import requests
from bs4 import BeautifulSoup 

URL = 'https://www.amazon.com/Nike-Rival-Track-Field-Shoes/dp/B07HYNB7VV/'

headers = {"User-Agent":'Mozilla/5.0 (Windows NT 10.0; Win64; x64) 
AppleWebKit/57.36 (HTML, like Gecko) Chrome/75.0.30.100 Safari/537.4'}

page =requests.get(URL,headers)

soup = BeautifulSoup(page.content,'html.parser')

title = soup.find(id="productTitle")

print(title)import requests

标签: pythonweb-scrapingbeautifulsoup

解决方案


我正在检查返回的 HTML,并意识到亚马逊发送了一个(有点格式错误?) HTML 会触发 default html.parser,但使用lxml我能够很好地抓取标题。

import requests
from bs4 import BeautifulSoup 

def make_soup(url: str) -> BeautifulSoup:
    res = requests.get(url, headers={
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0'
    })
    res.raise_for_status()
    return BeautifulSoup(res.text, 'lxml')

def parse_product_page(soup: BeautifulSoup) -> dict:
    title = soup.select_one('#productTitle').text.strip()
    return {
        'title': title
    }

if __name__ == "__main__":
    url = 'https://www.amazon.com/Nike-Rival-Track-Field-Shoes/dp/B07HYNB7VV/'
    soup = make_soup(url)
    info = parse_product_page(soup)
    print(info) 

输出:

{'title': "Nike Men's Zoom Rival M 9 Track and Field Shoes"}

推荐阅读