首页 > 解决方案 > Scrapy 不遵循给定的请求

问题描述

# -*- coding: utf-8 -*-
import logging

import scrapy
from scrapy.shell import inspect_response


class SuvlistingsSpider(scrapy.Spider):
    name = 'SuvListings'
    allowed_domains = ['https://www.gumtree.com.au']
    start_urls = [
        'https://www.gumtree.com.au/s-cars-vans-utes/sydney/carbodytype-suv/forsaleby-ownr/c18320l3003435/',
    ]

    def parse(self, response):
        self.log('Received response for listings page', level=logging.INFO)

        main = response.css('.panel-body.panel-body--flat-panel-shadow.user-ad-collection__list-wrapper')[-1]
        for a in main.css('a'):
            req = response.follow(a, callback=self.parse_item)
            yield req

    def parse_item(self, response):
        0/0
        yield {
            'price': response.xpath('normalize-space(//div[@id="ad-price"]/div/span[1])').extract(),
        }

上面的代码不会触发异常。我让它在 Pycharm 的调试中运行。它是一个锚选择器,如scrapy 网站上的教程中所述,但没有任何内容被刮掉。这里有什么问题?

标签: pythonscrapy

解决方案


allowed_domains您必须只指定一个没有方案的域 (www.gumtree.com.au) 。否则,scrapy 会阻止所有“异地”请求,认为他们的域与允许的域不匹配。


推荐阅读