python - 如何获得我想要的所有网址?
问题描述
def i_parse(self, response):
#>>> page_url is the all sub-site which the current page
page_url = response.xpath('//dl[@class="list-left public-box"]//a[contains(@target,"_blank")]/@href').getall()
#>>> next_page_url is next page
next_page_url = self.start_urls[0] + response.xpath('//a[@class="page-en"]/@href').get()
#>>> Determine if the last page exists, the last page. If it is empty, there is a next page; if not, it is the last page
if response.xpath('//span[@class="page-ch"]') == []:
yield page_url
yield scrapy.Request(url=next_page_url, callback=self.i_parse, dont_filter=True)
for url in self.i_parse(response):
print(url)
对不起,我的英语很差!所以这是翻译我想获得所有子页面的链接,但输出仅适用于第一个子页面。yield
最后,当涉及到递归函数中的回调时,我不知道该怎么办。Yield from
返回一个错误,指出请求对象不可迭代。