首页 > 解决方案 > 如何在scrapy中提交带有会话的表单

问题描述

我正在尝试使用Scrapy抓取网站。要获得我想要的内容,我需要先登录。网址是login_url

我的表格如下:

在此处输入图像描述

我的代码如下:

LOGIN_URL1 = "https://www.partslink24.com/partslink24/user/login.do"
class PartsSpider(scrapy.Spider):
    name = "parts"
    login_url = LOGIN_URL1
    start_urls = [
        login_url,
    ]

    def parse(self, response):
        form_data = {
            'accountLogin': COMPANY_ID,
            'userLogin': USERNAME,
            'loginBean.password': PASSWORD
        }
        yield FormRequest(url=self.login_url, formdata=form_data, callback=self.parse1)

    def parse1(self, response):
        inspect_response(response, self)
        print("RESPONSE: {}".format(response))


def start_scraper(vin_number):
    process = CrawlerProcess()
    process.crawl(PartsSpider)
    process.start()

但问题是他们检查会话是否已激活并且我收到错误,无法提交表单。

当我检查提交登录表单后得到的响应时,我收到以下错误:

在此处输入图像描述

他们网站上的代码检查如下:

var JSSessionChecker = {
    check: function()
    {
        if (!Ajax.getTransport())
        {
            alert('NO_AJAX_IN_BROWSER');
        }
        else
        {
            
            new Ajax.Request('/partslink24/checkSessionCookies.do', {
                method:'post',
                onSuccess: function(transport)
                {
                    if (transport.responseText != 'true')
                    {
                        if (Object.isFunction(JSSessionChecker.showError)) JSSessionChecker.showError(); 
                    }
                },
                onFailure: function(e) 
                { 
                    if (Object.isFunction(JSSessionChecker.showError)) JSSessionChecker.showError(); 
                },
                onException: function (request, e) 
                { 
                    if (Object.isFunction(JSSessionChecker.showError)) JSSessionChecker.showError(); 
                }
            });
        }
    },
    
    showError: function()
    {
        var errorElement = $('sessionCheckError');
        if (errorElement)
        {
            errorElement.show();
        }
    }
};
JSSessionChecker.check();

成功时它只返回true

有什么方法可以在提交表单之前激活会话?

提前致谢。

编辑

我使用@fam 的答案得到的错误页面。

在此处输入图像描述

标签: pythonscrapy

解决方案


请检查此代码。

import scrapy

LOGIN_URL1 = "https://www.partslink24.com/partslink24/user/login.do"
class PartsSpider(scrapy.Spider):
    name = "parts"
    login_url = LOGIN_URL1
    start_urls = [
        login_url,
    ]

    def parse(self, response):
        form_data = {
            'loginBean.accountLogin': "COMPANY_ID",
            'loginBean.userLogin': "USERNAME",
            'loginBean.sessionSqueezeOut' : "false",
            'loginBean.password': "PASSWORD",
            'loginBean.userOffsetSec' : "18000",
            'loginBean.code2f' : ""
        }
        yield scrapy.FormRequest.from_response(response=response, url=self.login_url, formdata=form_data, callback=self.parse1)

    def parse1(self, response):
        #scrapy.inspect_response(response, self)
        print("RESPONSE: {}".format(response))


def start_scraper(vin_number):
    process = scrapy.CrawlerProcess()
    process.crawl(PartsSpider)
    process.start()

我没有收到错误,响应如下:

RESPONSE: <200 https://www.partslink24.com/partslink24/user/login.do>

编辑: 以下代码适用于 Selenium。它会让您轻松登录页面。您只需要下载 chrome 驱动程序并安装 Selenium。

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.options import Options
import time


chrome_options = Options()
#chrome_options.add_argument("--headless")


driver = webdriver.Chrome(executable_path="./chromedriver", options=chrome_options)
driver.get("https://www.partslink24.com/partslink24/user/login.do")

# enter the form fields
company_ID = "company id"
user_name = "user name"
password = "password"

company_ID_input = driver.find_element_by_xpath("//input[@name='accountLogin']")
company_ID_input.send_keys(company_ID)
time.sleep(1)

user_name_input = driver.find_element_by_xpath("//input[@name='userLogin']")
user_name_input.send_keys(user_name)
time.sleep(1)

password_input = driver.find_element_by_xpath("//input[@id='inputPassword']")
password_input.send_keys(password)
time.sleep(1)

# click the search button and get links from first page
click_btn = driver.find_element_by_xpath("//a[@tabindex='5']")
click_btn.click()
time.sleep(5)

不要忘记更改凭据。


推荐阅读