首页 > 解决方案 > 无法在scrapy的解析回调中发送请求

问题描述

我有一堂课来抓取一些数据:

class SiteSpider(scrapy.Spider):
    name = "somesite"
    start_urls = ['https://www.somesite.com']

    def start_requests(self):
        parser = CommentParser()
        urls = ['https://www.somesite.com']
        for url in urls:
            yield scrapy.Request(url=url, callback=parser.scrape)

在 CommentParser 类中我有:

class CommentParser():
    def scrape(self, response):
        print("from CommentParser.scrape =>", response.url)
        for i in range(5):
            yield scrapy.Request(url="https://www.somesite.com/comments/?page=%d" % i, callback=self.parse)
    
    def parse(self,response):
        print("from CommentParser.parse => ", response.url)
        yield dict(response_url = response.url)

但是scrapy不会在CommentParser类中发送请求,所以我无法在CommentParser.parse中得到响应

标签: pythonscrapy

解决方案


你必须玩 OOP,注意SiteSpider(CommentParser):这意味着SiteSpider可以访问CommentParser

class CommentParser(scrapy.Spider):
    def scrape(self, response):
        print("from CommentParser.scrape =>", response.url)
        for i in range(5):
            yield scrapy.Request(url="https://www.somesite.com/comments/?page=%d" % i, callback=self.parse)

    def parse(self,response):
        print("from CommentParser.parse => ", response.url)
        yield dict(response_url = response.url)

class SiteSpider(CommentParser):
    name = "somesite"
    start_urls = ['https://www.somesite.com']

    def start_requests(self):
        urls = ['https://www.somesite.com']
        for url in urls:
            yield scrapy.Request(url=url, callback=self.scrape) #This will call CommentParser's scrape method

推荐阅读