python - 关于在scrapy中重命名下载的图片
问题描述
我是scrapy的新手,所以对我来说很难在scrapy中做非常基本的事情。我的问题是我无法重命名下载的图像。我从这个网站复制了我的部分代码:“ http://scrapingauthority.com/scrapy-download-images/ ”但它不起作用。所以我的蜘蛛代码是这样的:
from scrapy import Request, Spider
from Imagenes.items import ImagenesItem
class AuthorSpider(Spider):
name = 'imagenpruebarenombrar'
start_urls = [
"http://quotes.toscrape.com/",
]
def parse(self, response):
item = ImagenesItem()
img_urls = [
"http://automationpractice.com/img/p/5/5-large_default.jpg",
"http://automationpractice.com/img/p/6/6-large_default.jpg",
"http://automationpractice.com/img/p/7/7-large_default.jpg",
]
img_name = [ #These are the names that I want to my images
"1",
"2",
"3",
]
item["image_urls"] = img_urls
item["image_name"] = img_name
return item
物品代码:
import scrapy
class ImagenesItem(scrapy.Item):
images = scrapy.Field()
image_urls = scrapy.Field()
image_name = scrapy.Field()
管道代码:
class CustomImageNamePipeline(ImagesPipeline): #I copied this code from the website
def get_media_requests(self, item, info):
return [Request(x, meta={'image_name': item["image_name"]})
for x in item.get('image_urls', [])]
def file_path(self, request, response=None, info=None):
return '%s.jpg' % request.meta['image_name']
我的设置:
BOT_NAME = 'Imagenes'
SPIDER_MODULES = ['Imagenes.spiders']
NEWSPIDER_MODULE = 'Imagenes.spiders'
ITEM_PIPELINES = {'scrapy.pipelines.images.ImagesPipeline': 1}
IMAGES_STORE = r"C:\Users\Orlando\Imagenes"
解决方案
首先你需要编辑你的settings.py
:
ITEM_PIPELINES = {'Imagenes.pipelines.CustomImageNamePipeline': 1}
接下来在你的pipelines.py
:
class CustomImageNamePipeline(ImagesPipeline): #I copied this code from the website
def get_media_requests(self, item, info):
for image in item.get('image_urls', []):
yield scrapy.Request(image["url"], meta={'image_name': image["name"]})
def file_path(self, request, response=None, info=None):
return '%s.jpg' % request.meta['image_name']
最后在你的蜘蛛中:
def parse(self, response):
item = ImagenesItem()
img_urls = [
"http://automationpractice.com/img/p/5/5-large_default.jpg",
"http://automationpractice.com/img/p/6/6-large_default.jpg",
"http://automationpractice.com/img/p/7/7-large_default.jpg",
]
img_names = [ #These are the names that I want to my images
"1",
"2",
"3",
]
images = []
for image_url, image_name in zip(img_urls, img_names):
images.append({'url': image_url, 'name': image_name})
item["image_urls"] = images
yield item
推荐阅读
- vbscript - 在 VBS 中按名称而不是 id 选择项目
- javascript - 如何使用导航请求模式?
- node.js - node.js 中静态文件的 Pah 引用
- reactjs - npm run build 不使用代理
- spring-boot - 使用spring在微服务中进行身份验证
- excel - 删除受保护工作表上的行
- python - JQ 到 Python 的转换
- php - cs-cart 在前端搜索中在哪里设置排序参数?
- angular - Angular 无法从 Google Chrome 上的服务器下载大于 10MB 的文件
- java - 在 URL 中嵌入凭据不适用于 selenium 和 chrome