首页 > 解决方案 > Scrapy-splash 找不到图片源网址

问题描述

我正在尝试从 ZARA 抓取产品页面。喜欢这个:https ://www.zara.com/us/en/fitted-houndstooth-blazer-p07808160.html?v1=108967877&v2=1718115

我的scrapy-splash 容器正在运行。在外壳中,我获取页面

fetch('http://localhost:8050/render.html?url=https://www.zara.com/us/en/fitted-houndstooth-blazer-p07808160.html?v1=108967877&v2=1718115')
2021-05-14 14:30:42 [scrapy.core.engine] INFO: Spider opened
2021-05-14 14:30:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://localhost:8050/render.html?url=https://www.zara.com/us/en/fitted-houndstooth-blazer-p07808160.html?v1=108967877&v2=1718115> (referer: None)

到目前为止一切正常,我可以获得标题和价格。但是,我想获取产品的图像 URL。

我试图通过

response.css('img.media-image__image::attr(src)').getall()

但回应是这样的:

['https://static.zara.net/stdstatic/1.211.0-b.44/images/transparent-background.png', 'https://static.zara.net/stdstatic/1.211.0-b.44/images/transparent-background.png', 'https://static.zara.net/stdstatic/1.211.0-b.44/images/transparent-background.png', 'https://static.zara.net/stdstatic/1.211.0-b.44/images/transparent-background.png', 'https://static.zara.net/stdstatic/1.211.0-b.44/images/transparent-background.png', 'https://static.zara.net/stdstatic/1.211.0-b.44/images/transparent-background.png']

这是所有背景图像,而不是真实图像。我可以在浏览器上显示图像,并且我看到这些图像来自网络请求。是因为它加载了 AJAX 请求吗?我该如何解决这个问题?

标签: pythonweb-scrapingscrapyscrapy-splash

解决方案


我上周才开始研究网络抓取,所以我不确定我是否能帮上大忙,但我确实找到了一些东西。

源代码在顶部的脚本中显示了这一点:

_mkt_imageDir = /BASE_IMAGES_URL=(.*?);/.test(document.cookie) && RegExp.$1 || 'https://static.zara.net/photos/';

这进一步下来:

"originalUrl":"/us/en/fitted-houndstooth-blazer-p07808160.html?v1=108967877&v2=1718115","imageBaseUrl":"https://static.zara.net/photos/"

然后这里的所有图像似乎都在 javascript 中:

[{"@context":"http://schema.org/","@type":"Product","sku":"108967877-046-1","name":"FITTED HOUNDSTOOTH BLAZER","mpn":"108967877-046-1","brand":"ZARA","description":"","image":["https://static.zara.net/photos///2021/I/0/1/p/7808/160/046/2/w/1920/7808160046_1_1_1.jpg?ts=1620821843383","https://static.zara.net/photos///2021/I/0/1/p/7808/160/046/2/w/1920/7808160046_2_1_1.jpg?ts=1620821851988","https://static.zara.net/photos///2021/I/0/1/p/7808/160/046/2/w/1920/7808160046_2_2_1.jpg?ts=1620821839280","https://static.zara.net/photos///2021/I/0/1/p/7808/160/046/2/w/1920/7808160046_6_1_1.jpg?ts=1620655538200","https://static.zara.net/photos///2021/I/0/1/p/7808/160/046/2/w/1920/7808160046_6_2_1.jpg?ts=1620655535611","https://static.zara.net/photos///2021/I/0/1/p/7808/160/046/2/w/1920/7808160046_6_3_1.jpg?ts=1620656141718","https://static.zara.net/photos///contents/cm/w/1920/sustainability-extrainfo-label-JL78_0.jpg?ts=1602602200357"]

我不知道你将如何刮掉它们,但当你发现时我会很想知道答案。

问候塞缪尔


推荐阅读