web-scraping - Scrapy IdentationError:期望一个识别块
问题描述
相信你做得很好。请我需要您的帮助,我收到此错误但我不知道为什么:
File "C:\Users\Luis\Amazon\mercado\spiders\spider.py", line 14
yield scrapy.Request("https://www.amazon.es/s/ref=sr_pg_2?rh=n%3A1951051031%2Cn%3A2424922031%2Ck%3Afebi&page=1&keywords=febi&ie=UTF8&qid=1535314254",self.parse_item)
^IndentationError: expected an indented block
# -*- coding: utf-8 -*-
import scrapy
import urllib
from mercado.items import MercadoItem
class MercadoSpider(CrawlSpider):
name = 'mercado'
item_count = 0
allowed_domain = ['https://www.amazon.es']
start_urls = ['https://www.amazon.es/s/ref=sr_pg_2rh=n%3A1951051031%2Cn%3A2424922031%2Ck%3Afebi&page=1&keywords=febi&ie=UTF8&qid=1 535314254']
def start_requests(self):
yield scrapy.Request("https://www.amazon.es/s/ref=sr_pg_2?rh=n%3A1951051031%2Cn%3A2424922031%2Ck%3Afebi&page=1&keywords=febi&ie=UTF8&qid=1535314254",self.parse_item)
for i in range(2,400):
yield scrapy.Request("https://www.amazon.es/s/ref=sr_pg_2?rh=n%3A1951051031%2Cn%3A2424922031%2Ck%3Afebi&page="+str(i)+"&keywords=febi&ie=UTF8&qid=1535314254",self.parse_item)
def parse_item(self, response):
ml_item = MercadoItem()
#info de producto
ml_item['articulo'] = response.xpath('normalize-space(//*[@id="productTitle"])').extract()
ml_item['precio'] = response.xpath('normalize-space(//*[@id="priceblock_ourprice"])').extract()
self.item_count += 1
yield ml_item
你知道为什么吗?我在这里添加了代码以便轻松完成。
解决方案
你有一个缩进错误:
# -*- coding: utf-8 -*-
import scrapy
import urllib
from mercado.items import MercadoItem
class MercadoSpider(CrawlSpider):
name = 'mercado'
item_count = 0
allowed_domain = ['https://www.amazon.es']
start_urls = ['https://www.amazon.es/s/ref=sr_pg_2rh=n%3A1951051031%2Cn%3A2424922031%2Ck%3Afebi&page=1&keywords=febi&ie=UTF8&qid=1 535314254']
def start_requests(self):
yield scrapy.Request("https://www.amazon.es/s/ref=sr_pg_2?rh=n%3A1951051031%2Cn%3A2424922031%2Ck%3Afebi&page=1&keywords=febi&ie=UTF8&qid=1535314254",self.parse_item)
for i in range(2,400):
yield scrapy.Request("https://www.amazon.es/s/ref=sr_pg_2?rh=n%3A1951051031%2Cn%3A2424922031%2Ck%3Afebi&page="+str(i)+"&keywords=febi&ie=UTF8&qid=1535314254",self.parse_item)
def parse_item(self, response):
ml_item = MercadoItem()
#info de producto
ml_item['articulo'] = response.xpath('normalize-space(//*[@id="productTitle"])').extract()
ml_item['precio'] = response.xpath('normalize-space(//*[@id="priceblock_ourprice"])').extract()
self.item_count += 1
yield ml_item
更新但现在你有代码(不是最佳的)来获取分页和解析详细信息页面。您需要添加代码来解析每个分页页面并获取每个项目的详细链接:
def start_requests(self):
yield scrapy.Request("https://www.amazon.es/s/ref=sr_pg_2?rh=n%3A1951051031%2Cn%3A2424922031%2Ck%3Afebi&page=1&keywords=febi&ie=UTF8&qid=1535314254",self.parse_search)
for i in range(2,400):
yield scrapy.Request("https://www.amazon.es/s/ref=sr_pg_2?rh=n%3A1951051031%2Cn%3A2424922031%2Ck%3Afebi&page="+str(i)+"&keywords=febi&ie=UTF8&qid=1535314254",self.parse_search)
def parse_search(self, response):
for item_link in response.xpath('//ul[@id="s-results-list-atf"]//a[contains(@class, "s-access-detail-page")]/@href').extract():
yield scrapy.Request(item_link, self.parse_item)
def parse_item(self, response):
ml_item = MercadoItem()
#info de producto
ml_item['articulo'] = response.xpath('normalize-space(//*[@id="productTitle"])').extract()
ml_item['precio'] = response.xpath('normalize-space(//*[@id="priceblock_ourprice"])').extract()
self.item_count += 1
yield ml_item
推荐阅读
- android-fragments - Room Database 提供的“刷新”LiveData 的最佳方式
- flutter - 如何在 Flutter 中禁用 Web 支持?
- ruby-on-rails - Rails:如何检索数据库凭据以安装 ForestAdmin
- node.js - Node App 仅使用 nodemon 在服务器上运行
- javascript - 如何在成功的ajax响应中重定向到其他html文件
- c - 为什么会出现这个错误的变量错误?
- python - 如何使用 PyPDF2 将 *.png 图像文件插入 .pdf 文件
- python - uproot.lazyarrays 没有读取 uproot.open 显示的内容
- svelte - _layout.svelte 如何知道它是否在错误页面中?
- sql - 尝试合并两个表时,合并语句在 oracle 上给出错误