首页 > 解决方案 > 自定义项目管道中多个文件的 CsvItemExporter 不导出所有项目

问题描述

我创建了一个项目管道作为这个问题的答案。
它应该根据page_no项目中设置的值为每个页面创建一个新文件。这工作得很好。
问题在于管道/项目导出器生成的最后一个 csv 文件,page-10.csv.
最后 10 个值未导出,因此文件保持为空。这种行为的原因可能是什么?

管道.py

from scrapy.exporters import CsvItemExporter

class PerFilenameExportPipeline:
    """Distribute items across multiple CSV files according to their 'page_no' field"""

    def open_spider(self, spider):
        self.filename_to_exporter = {}

    def spider_closed(self, spider):
        for exporter in self.filename_to_exporter.values():
            exporter.finish_exporting()

    def _exporter_for_item(self, item):
        filename = 'page-' + str(item['page_no'])
        del item['page_no']
        if filename not in self.filename_to_exporter:
            f = open(f'{filename}.csv', 'wb')
            exporter = CsvItemExporter(f, export_empty_fields=True)
            exporter.start_exporting()
            self.filename_to_exporter[filename] = exporter
        return self.filename_to_exporter[filename]

    def process_item(self, item, spider):
        exporter = self._exporter_for_item(item)
        exporter.export_item(item)
        return item

蜘蛛

import scrapy
from ..pipelines import PerFilenameExportPipeline


class spidey(scrapy.Spider):
    name = "idk"
    custom_settings = {
        'ITEM_PIPELINES': {
            PerFilenameExportPipeline: 100
        }
    }
    
    def start_requests(self):
        yield scrapy.Request("http://quotes.toscrape.com/", cb_kwargs={'page_no': 1})

    def parse(self, response, page_no):
        for qts in response.xpath("//*[@class=\"quote\"]"):
            yield {
                'page_no': page_no,
                'author' : qts.xpath("./span[2]/small/text()").get(),
                'quote' : qts.xpath("./*[@class=\"text\"]/text()").get()
            }

        next_pg = response.xpath('//li[@class="next"]/a/@href').get()      
        if next_pg is not None:
            yield response.follow(next_pg, cb_kwargs={'page_no': page_no + 1})

标签: scrapyscrapy-pipeline

解决方案


推荐阅读