python - 网页抓取没有结果
问题描述
我想在网站everysize.com 上打印运动鞋的href 已 检查href&class 网站
href 位于 li class='item span3 减少 - 加载的值' 中,我尝试使用此代码打印它
import requests
from bs4 import BeautifulSoup
baseurl = 'https://www.everysize.com/'
headers = {
'User-Agent' : 'my user agent which i deleted for this'
}
r = requests.get('https://www.everysize.com/sneaker-sale/')
soup = BeautifulSoup(r.content, 'lxml')
productlist = soup.find_all('li', class_='item span3 reduced reduced--value loaded')
productlinks = []
for item in productlist:
for link in item.find_all('a', href=True):
print(link['href'])
当我尝试在终端中运行此代码时,我只收到消息: [Done] exited with code=0 in 0.775 seconds but it should have print the individual hrefs? 任何人都可以看到我做错了什么
解决方案
要打印此站点的所有链接,您可以使用以下示例:
import requests
from bs4 import BeautifulSoup
url = "https://www.everysize.com/sneaker-sale/"
baseurl = "https://www.everysize.com"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
for a in soup.select("a.item-link"):
print(baseurl + a["href"])
印刷:
https://www.everysize.com/nike-air-force-1-cv1758-100.html
https://www.everysize.com/adidas-originals-ultraboost-20-eg0754.html
https://www.everysize.com/nike-air-force-1-pixel-ck6649-100.html
https://www.everysize.com/nike-air-force-1-07-ct2302-100.html
https://www.everysize.com/nike-air-force-1-07-dd8959-100.html
https://www.everysize.com/nike-air-force-1-gs-sneaker-314192-117.html
https://www.everysize.com/nike-air-max-270-sneaker-ah8050-100.html
https://www.everysize.com/nike-air-max-270-sneaker-ah8050-002.html
https://www.everysize.com/adidas-originals-supercourt-ee6037.html
...
推荐阅读
- router - 如何在 TYPO3 版本 9 中使用 customEnhancer 插件
- c++ - STL reverse_iterator 错误
- ios - 数组中的 Mutline 字符串
- android - 编写后台服务以在特定时间调用
- python - 如何更快地处理 nparray
- chatbot - 检查 rasa 中聊天机器人的实体和意图的置信度
- bootstrap-4 - 带有 br 的 bootstrap 4 工具提示无法正常工作
- c# - 仅在 ComboBox 中从第一项删除边距
- c# - 获取作为参数传递的变量的原始名称?
- db2 - db2 clpplus 每当 sqlerror 退出时