python - 无法在scrapy的解析回调中发送请求
问题描述
我有一堂课来抓取一些数据:
class SiteSpider(scrapy.Spider):
name = "somesite"
start_urls = ['https://www.somesite.com']
def start_requests(self):
parser = CommentParser()
urls = ['https://www.somesite.com']
for url in urls:
yield scrapy.Request(url=url, callback=parser.scrape)
在 CommentParser 类中我有:
class CommentParser():
def scrape(self, response):
print("from CommentParser.scrape =>", response.url)
for i in range(5):
yield scrapy.Request(url="https://www.somesite.com/comments/?page=%d" % i, callback=self.parse)
def parse(self,response):
print("from CommentParser.parse => ", response.url)
yield dict(response_url = response.url)
但是scrapy不会在CommentParser类中发送请求,所以我无法在CommentParser.parse中得到响应
解决方案
你必须玩 OOP,注意SiteSpider(CommentParser):
这意味着SiteSpider
可以访问CommentParser
class CommentParser(scrapy.Spider):
def scrape(self, response):
print("from CommentParser.scrape =>", response.url)
for i in range(5):
yield scrapy.Request(url="https://www.somesite.com/comments/?page=%d" % i, callback=self.parse)
def parse(self,response):
print("from CommentParser.parse => ", response.url)
yield dict(response_url = response.url)
class SiteSpider(CommentParser):
name = "somesite"
start_urls = ['https://www.somesite.com']
def start_requests(self):
urls = ['https://www.somesite.com']
for url in urls:
yield scrapy.Request(url=url, callback=self.scrape) #This will call CommentParser's scrape method