首页 > 解决方案 > 将渲染页面从 Selenium 传递给 Scrapy

问题描述

我想抓取需要登录的 Javascript 页面。我想知道是否可以使用 Selenium 加载并登录页面,然后将渲染的代码传递给 Scrapy 以进行数据提取。

import scrapy
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from seleniumrequests import Firefox,Chrome
from time import sleep
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


class ContractSpider(scrapy.Spider):

    name = "contracts"

    def start_requests(self):
        url = 'https://adactmedical.com/tpd'
        yield scrapy.Request(url=url, callback=self.parse)

    def __init__(self):
        driver = Firefox(executable_path='C:/Users/Matija/Dropbox/Programing/Scraping/geckodriver.exe')

        driver.implicitly_wait(5)

    @staticmethod
    def get__response(url):
        self.driver.get(url)
        return self.driver.page_source.encode('utf-8')

    def parse(self, response):
        selenium_response = Selector(text=self.get_selenium_response(response.url))
        print(selenium_response)

标签: pythonseleniumscrapy

解决方案


您需要使用 JavaScript 来执行此操作,但可以轻松完成。

import time

time.sleep(5)
html = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
print html

推荐阅读