python - 使用 BeautifulSoup 抓取和捕获多块产品信息
问题描述
我正在从以下网站抓取数据:https ://www.nike.com/w/sale-3yaep ..
我目前正在提取页面上显示的每个产品的 URL。我的问题是针对具有多个瓷砖(即多种颜色)的产品,我无法获取其他颜色的信息。例如,如果您找到具有多种颜色的产品并检查产品的 html - 您可以将鼠标悬停在图像上并在悬停在不同颜色上时看到 URL 的变化。
所以问题是 HTML 将只包含默认产品的产品 URL 的详细信息。是否有任何解决方法可以从具有多种颜色的产品中获取数据?
下面的代码:
pages = 'https://www.nike.com/w/sale-3yaep'
page = requests.get(pages,verify=False)
soup = BeautifulSoup(page.content, 'html.parser')
for prod in soup.find_all('div', {'class':'product-card__body'}):
prod_tag = prod.find('a')
link = str(prod_tag['href'])
解决方案
您在页面上看到的数据存储在 iside<script>
标记中,因此beautifulsoup
看不到它。你可以使用re
/ json
modules 来解析它:
import re
import json
import requests
url = "https://www.nike.com/w/sale-3yaep"
html_doc = requests.get(url).text
data = re.search(r"window\.INITIAL_REDUX_STATE=(\{.*\})", html_doc)
data = json.loads(data.group(1))
# uncomment to print all data:
# print(json.dumps(data, indent=4))
for p in data["Wall"]["products"]:
print(p["title"], p["subtitle"])
colors = p.get("colorways") or []
for c in colors:
print(
"{:<50} {}".format(
c["colorDescription"],
c["pdpUrl"].format(countryLang="https://www.nike.com"),
)
)
print()
印刷:
Nike Air Max Command Men's Shoe
Black/Neutral Grey/Anthracite https://www.nike.com/t/air-max-command-shoe-bdw7RQ/749760-001
Nike WearAllDay Older Kids' Shoe
Black/Black/Black https://www.nike.com/t/wearallday-older-shoe-xtvQKM/CJ3816-001
Smoke Grey/Pink Glow/Off-Noir/Metallic Copper https://www.nike.com/t/wearallday-older-shoe-xtvQKM/CJ3816-006
White/Black https://www.nike.com/t/wearallday-older-shoe-xtvQKM/CJ3816-101
Nike Downshifter 10 Men's Running Shoe
Black/Iron Grey/Black https://www.nike.com/t/downshifter-10-running-shoe-QL0NBl/CI9981-002
Nike ESC Men's Knit Tracksuit Jacket
Black https://www.nike.com/t/esc-knit-tracksuit-jacket-sHVQ8Q/CW3744-010
None None
Nike Downshifter 11 Women's Running Shoe
White/Pure Platinum/Wolf Grey/Metallic Silver https://www.nike.com/t/downshifter-11-running-shoe-lp6Sh5/CW3413-100
Nike Swoosh Run Women's Mid-Rise 7/8 Running Leggings
Black https://www.nike.com/t/swoosh-run-mid-rise-7-8-running-leggings-6vw38F/DA1145-010
Nike Air Max Zephyr Older Kids' Shoe
Black/Dark Smoke Grey https://www.nike.com/t/air-max-zephyr-older-shoe-GmwbKV/CN8511-001
Smoke Grey/Black/Photon Dust/Siren Red https://www.nike.com/t/air-max-zephyr-older-shoe-GmwbKV/CN8511-003
Black/Sapphire/Sunset Pulse/Metallic Silver https://www.nike.com/t/air-max-zephyr-older-shoe-GmwbKV/CN8511-004
Nike Air Zoom Pulse Shoes
Black/Black/Black https://www.nike.com/t/air-zoom-pulse-shoes-9PlRZ2/CT1629-003
Nike Sportswear Down-Fill Windrunner Men's Jacket
Black/Black/Black/Black https://www.nike.com/t/sportswear-down-fill-windrunner-jacket-hHNjxL/CU4404-010
White/Dark Smoke Grey/Dark Smoke Grey/Black https://www.nike.com/t/sportswear-down-fill-windrunner-jacket-hHNjxL/CU4404-100
Nike Air Max 2090 Women's Shoe
Black/Metallic Silver/White https://www.nike.com/t/air-max-2090-shoe-C0FP38/CK2612-002
White/Wolf Grey/Black https://www.nike.com/t/air-max-2090-shoe-C0FP38/CK2612-100
Nike One Women's Mid-Rise Crop Graphic Leggings
Black/Light Photo Blue/Chile Red https://www.nike.com/t/one-mid-rise-crop-graphic-leggings-LNLjt9/CZ9202-011
Nike Reposto Older Kids' Shoe
Black/Dark Smoke Grey/Iron Grey/White https://www.nike.com/t/reposto-older-shoe-CMDjSc/DA3260-012
Grey Fog/Iron Grey/Volt/Game Royal https://www.nike.com/t/reposto-older-shoe-CMDjSc/DA3260-005
Light Violet/Crimson Bliss/Platinum Tint/Metallic Silver https://www.nike.com/t/reposto-older-shoe-CMDjSc/DA3260-500
Nike City Trainer 3 Women's Training Shoes
Black/Anthracite/White https://www.nike.com/t/city-trainer-3-training-shoes-lChhbP/CK2585-006
Nike Air Zoom Winflo 7 Men's Running Shoe
Black/Anthracite/White https://www.nike.com/t/air-zoom-wio-7-running-shoe-BsFScT/CJ0291-005
Nike ESC Men's Modern Polo
Navy https://www.nike.com/t/esc-modern-polo-0skhPC/CW3747-414
Black https://www.nike.com/t/esc-modern-polo-0skhPC/CW3747-010
Nike Air Max 2X Women's Shoe
White/White/Black https://www.nike.com/t/air-max-2x-shoe-3hqsQl/CK2947-100
Black/Black/White https://www.nike.com/t/air-max-2x-shoe-3hqsQl/CK2947-001
Nike Sportswear Swoosh Women's Woven Trousers
Steam https://www.nike.com/t/sportswear-swoosh-woven-trousers-N3gKw3/CZ8909-006
University Gold https://www.nike.com/t/sportswear-swoosh-woven-trousers-N3gKw3/CZ8909-739
Nike MD Runner 2 Men's Shoe
Midnight Navy/Wolf Grey/White https://www.nike.com/t/md-runner-2-shoes-PATZpBgm/749794-410
Black/Anthracite/White https://www.nike.com/t/md-runner-2-shoes-PATZpBgm/749794-010
Nike Sportswear Essential Women's Dress
Black/White https://www.nike.com/t/sportswear-essential-dress-kdTcH3/CU6509-010
Nike Air Max ZM950 Older Kids' Shoe
Black/Volt/Smoke Grey/White https://www.nike.com/t/air-max-zm950-older-shoe-fhbCg2/CN9835-003
Light Bone/Stone/Sequoia/Citron Pulse https://www.nike.com/t/air-max-zm950-older-shoe-fhbCg2/CN9835-005
Black/Metallic Silver/Bright Crimson/Black https://www.nike.com/t/air-max-zm950-older-shoe-fhbCg2/CN9835-002
Black/Sapphire/White/Sunset Pulse https://www.nike.com/t/air-max-zm950-older-shoe-fhbCg2/CN9835-006
Nike Sportswear Older Kids' (Girls') Crop T-Shirt
Pink Foam/White https://www.nike.com/t/sportswear-older-crop-t-shirt-MM4cdd/DJ4017-663
Nike SB Zoom Blazer Low Pro GT Skate Shoe
Black/Black/Gum Light Brown/White https://www.nike.com/t/sb-zoom-blazer-low-pro-gt-skate-shoe-bMwZmc/DC7695-002
Nike Sportswear Synthetic-Fill Men's Jacket
Black/Black/Black/Sail https://www.nike.com/t/sportswear-synthetic-fill-jacket-63lvqV/CU4422-010
Midnight Navy/Midnight Navy/Midnight Navy/Sail https://www.nike.com/t/sportswear-synthetic-fill-jacket-63lvqV/CU4422-410
Nike SuperRep Groove Women's Cardio Dance Shoes
White/Metallic Gold Coin/Black/Black https://www.nike.com/t/superrep-groove-cardio-dance-shoes-b9WmfB/CT1248-109
White/Chutney/Black https://www.nike.com/t/superrep-groove-cardio-dance-shoes-b9WmfB/CT1248-107
推荐阅读
- c# - 如何在控制器操作中访问 JwtBearer 身份验证处理程序配置?
- excel - Excel powerquery:如何创建一个包含其他列值总和的列?
- json - 我想更新 json 数据类型列中的整个 json
- ruby - 如何使用 Curb 浏览 URL 数组
- swift - 多部分/表单数据的问题,图片上传正常,但其他表单数据没有发送到服务器
- coldfusion - Canonicalize() 函数将字符转换为空白
- security - 使用 JWT 进行身份验证时是否可以消除私钥攻击向量?
- django - 在 django 中使用过滤创建聚合查询
- python - 如何从数据框每一行中的一系列数字中选择特定值
- r - 当存在一列字符串时,如何编写汇总()的结果?