python - How to use yield instead of print with scrapy selector and selenium?
问题描述
from scrapy import Spider
from selenium import webdriver
from scrapy.selector import Selector
class FlipkartSpider(Spider):
name = 'flipkarttrial1'
allowed_domains = ['flipkart.com']
def start_requests(self):
self.driver = webdriver.Chrome('C:\\Users\\xyz\\chromedriver')
self.driver.get('https://www.flipkart.com/womens-footwear/heels/pr?sid=osp,iko,6q1&otracker=nmenu_sub_Women_0_Heels')
sel = Selector(text=self.driver.page_source)
prices = sel.xpath('//div/div[@class="_1vC4OE"]/text()').extract()
for price in prices:
print(price)
def parse(self, response):
pass
Here the scraper prints the price but when I use yield, it throws an error. I want to save prices to csv file. How can I save the data using 'yield'?
解决方案
You can easily omit selenium and scrapy.Selector and just use the response in the parse method.
import scrapy
class FlipkartSpider(scrapy.Spider):
name = 'flipkarttrial1'
allowed_domains = ['flipkart.com']
def start_requests(self):
url = 'https://www.flipkart.com/womens-footwear/heels/pr?sid=osp,iko,6q1&otracker=nmenu_sub_Women_0_Heels'
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
prices = response.xpath('//div/div[@class="_1vC4OE"]/text()').extract()
for price in prices:
yield {'price':price}
Then run it using scrapy crawl flipkarttrial1 -o data.csv -t csv
Edited: If you still want Selenium, you can use the python's csv module to write the csv file.
The start_request
method must return a scrapy.Request(url, callback)
object, then the callback method (in below code, the parse
method is the callback) is the one doing the rest of the work.
class FlipkartSpider(Spider):
name = 'flipkarttrial1'
allowed_domains = ['flipkart.com']
start_urls = ['https://www.flipkart.com/womens-footwear/heels/pr?sid=osp,iko,6q1&otracker=nmenu_sub_Women_0_Heels']
def parse(self, response):
self.driver = webdriver.Chrome('C:\xyz\chromedriver')
self.driver.get(response.url)
sel = Selector(text=self.driver.page_source)
prices = sel.xpath('//div/div[@class="_1vC4OE"]/text()').extract()
output = open('output_data.csv', 'w')
fieldnames = ['price']
csv_file = csv.DictWriter(output, fieldnames)
for price in prices:
yield csv_file.writerow({fieldnames[0]: price})
推荐阅读
- java - 角度/弹簧靴体编码
- python - 带有API的python中的AdalError
- javascript - 正则表达式 - 仅检测所有四个八位字节中的所有单个数字
- sharepoint - MS Graph API:尝试创建列表时访问被拒绝,即使使用 Sites.ReadWrite.All 应用程序权限
- ios - Firebase StorageReference.listAll() 完成处理程序在错误情况下意外触发两次
- javascript - instanceof 运算符可以用于测试任何值类型吗?
- javascript - 不从 HTML 生成 PDF/图像文件
- javascript - javascript错误SyntaxError:缺少:属性ID之后
- javascript - 访问 Stripe 门户
- docker - 服务器 2019 机器上的 Linux 容器