首页 > 解决方案 > 网页抓取没有结果

问题描述

我想在网站everysize.com 上打印运动鞋的href 已 检查href&class 网站

href 位于 li class='item span3 减少 - 加载的值' 中,我尝试使用此代码打印它

import requests

from bs4 import BeautifulSoup

baseurl = 'https://www.everysize.com/'

headers = {
'User-Agent' : 'my user agent which i deleted for this'
 }

r = requests.get('https://www.everysize.com/sneaker-sale/')

soup = BeautifulSoup(r.content, 'lxml')

productlist = soup.find_all('li', class_='item span3 reduced reduced--value loaded')

productlinks = [] 

 for item in productlist:
    for link in item.find_all('a', href=True):
    print(link['href'])

当我尝试在终端中运行此代码时,我只收到消息: [Done] exited with code=0 in 0.775 seconds but it should have print the individual hrefs? 任何人都可以看到我做错了什么

标签: pythonweb-scrapingweb-scraping-language

解决方案


要打印此站点的所有链接,您可以使用以下示例:

import requests
from bs4 import BeautifulSoup

url = "https://www.everysize.com/sneaker-sale/"
baseurl = "https://www.everysize.com"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

for a in soup.select("a.item-link"):
    print(baseurl + a["href"])

印刷:

https://www.everysize.com/nike-air-force-1-cv1758-100.html
https://www.everysize.com/adidas-originals-ultraboost-20-eg0754.html
https://www.everysize.com/nike-air-force-1-pixel-ck6649-100.html
https://www.everysize.com/nike-air-force-1-07-ct2302-100.html
https://www.everysize.com/nike-air-force-1-07-dd8959-100.html
https://www.everysize.com/nike-air-force-1-gs-sneaker-314192-117.html
https://www.everysize.com/nike-air-max-270-sneaker-ah8050-100.html
https://www.everysize.com/nike-air-max-270-sneaker-ah8050-002.html
https://www.everysize.com/adidas-originals-supercourt-ee6037.html

...

推荐阅读