首页 > 解决方案 > How to use yield instead of print with scrapy selector and selenium?

问题描述

from scrapy import Spider
from selenium import webdriver
from scrapy.selector import Selector

class FlipkartSpider(Spider):
    name = 'flipkarttrial1'
    allowed_domains = ['flipkart.com']

    def start_requests(self):
        self.driver = webdriver.Chrome('C:\\Users\\xyz\\chromedriver')
        self.driver.get('https://www.flipkart.com/womens-footwear/heels/pr?sid=osp,iko,6q1&otracker=nmenu_sub_Women_0_Heels')
        sel = Selector(text=self.driver.page_source)
        prices = sel.xpath('//div/div[@class="_1vC4OE"]/text()').extract()
        for price in prices:
            print(price)

    def parse(self, response):
        pass

Here the scraper prints the price but when I use yield, it throws an error. I want to save prices to csv file. How can I save the data using 'yield'?

标签: pythonpython-3.xselenium-webdriverweb-scrapingscrapy

解决方案


You can easily omit selenium and scrapy.Selector and just use the response in the parse method.

import scrapy

class FlipkartSpider(scrapy.Spider):
    name = 'flipkarttrial1'
    allowed_domains = ['flipkart.com']


    def start_requests(self):
        url = 'https://www.flipkart.com/womens-footwear/heels/pr?sid=osp,iko,6q1&otracker=nmenu_sub_Women_0_Heels'
        yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        prices = response.xpath('//div/div[@class="_1vC4OE"]/text()').extract()
        for price in prices:
            yield {'price':price}

Then run it using scrapy crawl flipkarttrial1 -o data.csv -t csv

Edited: If you still want Selenium, you can use the python's csv module to write the csv file.

The start_request method must return a scrapy.Request(url, callback) object, then the callback method (in below code, the parse method is the callback) is the one doing the rest of the work.

class FlipkartSpider(Spider):
    name = 'flipkarttrial1'
    allowed_domains = ['flipkart.com']
    start_urls = ['https://www.flipkart.com/womens-footwear/heels/pr?sid=osp,iko,6q1&otracker=nmenu_sub_Women_0_Heels']

    def parse(self, response):
        self.driver = webdriver.Chrome('C:\xyz\chromedriver')
        self.driver.get(response.url)
        sel = Selector(text=self.driver.page_source)
        prices = sel.xpath('//div/div[@class="_1vC4OE"]/text()').extract()

        output = open('output_data.csv', 'w')
        fieldnames = ['price']
        csv_file = csv.DictWriter(output, fieldnames)

        for price in prices:
            yield csv_file.writerow({fieldnames[0]: price})

推荐阅读