python-2.7 - 是什么导致了这个错误?请求 url 中缺少方案:h
问题描述
当我试图爬取我的网页时,它给了我输出,但出现了一些错误:
ValueError: Missing scheme in request url: h
书籍2.py
class Books1Spider(Spider):
name = 'books1'
allowed_domains = ['books.toscrape.com']
start_urls = ['http://books.toscrape.com/']
headers = {
"Host": "localhost",
"Connection": "keep-alive",
"Cache-Control": "max-age=0",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"DNT": "1",
"Accept-Encoding": "gzip, deflate, sdch",
"Accept-Language":"en-US,en;q=0.8"
}
def parse_book(self,response):
title = response.xpath('//h1/text()').extract_first()
price = response.xpath('.//*[@class="price_color"]/text()').extract_first()
image_urls = response.xpath('.//img/@src').extract_first()
image_urls = image_urls.replace('../..','http://books.toscrape.com/')
rating = response.xpath('//*[contains(@class,"star-rating")]/@class').extract_first()
rating = rating.replace('star-rating','')
description = response.xpath('//*[@id="product_description"]/following-sibling::p/text()').extract_first()
yield { 'title':title,
'price':price,
'image_urls':image_urls,
'rating':rating,
'description': description,
}
预期结果:
{'rating': u' Five', 'price': u'\xa352.29', 'description': u'Scott Pilgrim\'s life is totally sweet. He\'s 23 years old, he\'s in a rockband, he\'s "between jobs" and he\'s dating a cute high school girl. Nothing could possibly go wrong, unless a seriously mind-blowing, dangerously fashionable, rollerblading delivery girl named Ramona Flowers starts cruising through his dreams and sailing by him at parties. Will Scott\'s awesome life get Scott Pilgrim\'s life is totally sweet. He\'s 23 years old, he\'s in a rockband, he\'s "between jobs" and he\'s dating a cute high school girl. Nothing could possibly go wrong, unless a seriously mind-blowing, dangerously fashionable, rollerblading delivery girl named Ramona Flowers starts cruising through his dreams and sailing by him at parties. Will Scott\'s awesome life get turned upside-down? Will he have to face Ramona\'s seven evil ex-boyfriends in battle? The short answer is yes. The long answer is Scott Pilgrim, Volume 1: Scott Pilgrim\'s Precious Little Life ...more', 'image_urls': u'http://books.toscrape.com//media/cache/97/27/97275841c81e66d53bf9313cba06f23e.jpg', 'title': u"Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)"}
实际结果是:
2019-02-07 16:06:54 [scrapy.core.scraper] ERROR: Error processing {'rating': u' Five', 'price': u'\xa352.29', 'description': u'Scott Pilgrim\'s life is totally sweet. He\'s 23 years old, he\'s in a rockband, he\'s "between jobs" and he\'s dating a cute high school girl. Nothing could possibly go wrong, unless a seriously mind-blowing, dangerously fashionable, rollerblading delivery girl named Ramona Flowers starts cruising through his dreams and sailing by him at parties. Will Scott\'s awesome life get Scott Pilgrim\'s life is totally sweet. He\'s 23 years old, he\'s in a rockband, he\'s "between jobs" and he\'s dating a cute high school girl. Nothing could possibly go wrong, unless a seriously mind-blowing, dangerously fashionable, rollerblading delivery girl named Ramona Flowers starts cruising through his dreams and sailing by him at parties. Will Scott\'s awesome life get turned upside-down? Will he have to face Ramona\'s seven evil ex-boyfriends in battle? The short answer is yes. The long answer is Scott Pilgrim, Volume 1: Scott Pilgrim\'s Precious Little Life ...more', 'image_urls': u'http://books.toscrape.com//media/cache/97/27/97275841c81e66d53bf9313cba06f23e.jpg', 'title': u"Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)"}
Traceback (most recent call last):
File "/home/divum/venv/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/divum/venv/local/lib/python2.7/site-packages/scrapy/pipelines/media.py", line 79, in process_item
requests = arg_to_iter(self.get_media_requests(item, info))
File "/home/divum/venv/local/lib/python2.7/site-packages/scrapy/pipelines/images.py", line 155, in get_media_requests
return [Request(x) for x in item.get(self.images_urls_field, [])]
File "/home/divum/venv/local/lib/python2.7/site-packages/scrapy/http/request/__init__.py", line 25, in __init__
self._set_url(url)
File "/home/divum/venv/local/lib/python2.7/site-packages/scrapy/http/request/__init__.py", line 62, in _set_url
raise ValueError('Missing scheme in request url: %s' % self._url)
ValueError: Missing scheme in request url: h
解决方案
您正在提取image_urls
为u'…'
. 的值image_urls
必须是一个列表:[u'…']
。
在您的代码中,切换:
image_urls = response.xpath('.//img/@src').extract_first()
image_urls = image_urls.replace('../..','http://books.toscrape.com/')
至
image_url = response.xpath('.//img/@src').extract_first()
image_urls = [image_url.replace('../..','http://books.toscrape.com/')]
推荐阅读
- asp.net - 在 Docker 容器上使用 IIS 和 asp.net 核心 Web 应用程序的最佳实践?
- c# - 基于游戏对象增加/减少线渲染器的 x 和 z 索引
- delphi - 如何从没有按钮的表单中模拟模态结果?
- c++ - 套接字 send() 后跟 read() 立即变得垃圾
- reactjs - 如何按数组中的对象数量对graphql查询结果进行排序
- html - 如何从这个“https://jsonplaceholder.typicode.com/photos”Web API 获取图像,并在 Angular10 的网页上显示?
- html - 导航,中间有徽标,每侧有两个链接
- typescript - 为什么使用类型别名定义的对象 extends { [key: string]: unknown } 而使用接口定义的对象却没有?
- python - 绘图数据框不是从 matplotlib.axes.plot() 开始的
- ruby-on-rails - gem 安装 pg 失败