首页 > 解决方案 > 使用 bs4 进行网页抓取时如何找到特定的类?

问题描述

我正在尝试编写一个刮板来刮我网站上产品的产品 ID。

import requests
from bs4 import BeautifulSoup

URL = 'https://stockx.com/de-de/air-jordan-1-retro-high-dark-mocha'
headers = {
    'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Mobile Safari/537.36'
}


r = requests.get(URL, headers=headers)

soup = BeautifulSoup(r.text, 'html.parser')

soup.find('div', {'class':'detail'})
print(soup)

我想访问 class="detail",但是当执行它时它给了我整个站点的 html?我做错了什么?

标签: pythonwebweb-scrapingbeautifulsouppython-requests

解决方案


什么地方出了错

  • soup你这样分配,soup = BeautifulSoup(r.text, 'html.parser')所以它正在打印整个 html
  • 您想要分配和打印详细信息元素: detail = soup.find('div', {'class':'detail'})

尝试这个:

import requests
from bs4 import BeautifulSoup

URL = 'https://stockx.com/de-de/air-jordan-1-retro-high-dark-mocha'
headers = {
    'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Mobile Safari/537.36'
}


r = requests.get(URL, headers=headers)

soup = BeautifulSoup(r.text, 'html.parser')

detail = soup.find('div', {'class':'detail'})
print(detail)

推荐阅读