首页 > 解决方案 > Python、Selenium、Pandas DataFrame 和 Excel

问题描述

我无法拼凑拼图的最后一部分。整个代码如下所示,其中包括我正在抓取数据的站点的非必要用户名和密码。

在循环使用 Excel 文件中的零件号后

pd.read_excel()

Selenium 用于抓取相关网站的各种项目;然后代码成功地将这些值写入输出窗口。

与将数据写入输出窗口相反,我的目标是写入要从中提取数据的同一个 Excel 文件,并将其写入相应的列。

在代码的最后一个for循环中,我最初尝试通过附加将变量(正在打印到屏幕上)写入 Excel

.to_excel('filePathHere') 

到有问题的变量。例如,我尝试

description.to_excel('pathToFile/output.xlsx')

这会产生错误EOL while scanning string literal (<string>, line 1)

然后我想,也许这个变量需要转换为DataFrame,所以我尝试了

description_DataFrame = pd.DataFrame(description)
description_DataFrame.to_excel('pathToFile/output.xlsx')

这导致了相同的错误消息。

我什至不确定这是否是将每个项目写入现有(或新)文件的正确逻辑。如果是,我在这里找到了关于如何处理长字符串的解释:StackOverFlow EOL 错误,但我的数据都不是长字符串,所以我看不出它是如何应用的。

然后我开始认为我可能需要创建一个字典,然后附加到它。所以我然后从上面删除了任何尝试并尝试:

description = []
description.append(mfg_part)
mfg_part.to_excel('pathToFile/output.xlsx')

这仍然给我同样的 EOL 错误。

我不确定出了什么问题,以及为什么我不能将变量写入mfg_part, mfg_OEM, description加载的 Excel 文件中它们各自的列。

任何提示/提示将不胜感激。

完整的工作代码,打印到屏幕如下:

import time
#Need Selenium for interacting with web elements
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys

#Need numpy/pandas to interact with large datasets
import numpy as np
import pandas as pd
import itertools


# load in manufacture part number from a collection of components, via an Excel file
mfg_id_list = pd.read_excel("C:/Users/James/Documents/Python Scripts/jupyterNoteBooks/ScrapingData/MasterQuoteTemplate.xls")['Model']

# Create a dictionary to store product and price
# While the below works just fine, we want to create en empty pandas dataframe, so we can output to Excel later
productInfo = {}

chrome_path = r"C:\Users\James\Documents\Python Scripts\jupyterNoteBooks\ScrapingData\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.maximize_window()
driver.get("https://www.tessco.com/login")

userName = "FirstName.SurName321123@gmail.com"
password = "PasswordForThis123"

#Set a wait, for elements to load into the DOM
wait10 = WebDriverWait(driver, 10)
wait20 = WebDriverWait(driver, 20)
wait30 = WebDriverWait(driver, 30)

elem = wait10.until(EC.element_to_be_clickable((By.ID, "userID"))) 
elem.send_keys(userName)

elem = wait10.until(EC.element_to_be_clickable((By.ID, "password"))) 
elem.send_keys(password)

#Press the login button
driver.find_element_by_xpath("/html/body/account-login/div/div[1]/form/div[6]/div/button").click()

for i in mfg_id_list:

    #Expand the search bar
    searchBar = wait10.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#searchBar input")))

    #Enter information into the search bar
    #If cell is not blank

    if len(str(i)) != 0:
        searchBar.send_keys(Keys.CONTROL, 'a')
        searchBar.send_keys(i)
        driver.find_element_by_css_selector('a.inputButton').click()
        time.sleep(5)

        try:
            # wait for the products information to be loaded
            products = wait10.until(EC.presence_of_all_elements_located((By.XPATH,"//div[@class='CoveoResult']")))
            #isProductsThere = driver.find_element_by_xpath("//div[@class='CoveoResult']")

            if products:
                # iterate through all products in the search result and add details to dictionary
                for product in products:

                    # get product info such as OEM, Description and Part Number
                    productDescr = product.find_element_by_xpath(".//a[@class='productName CoveoResultLink hidden-xs']").text
                    mfgPart = product.find_element_by_xpath(".//ul[@class='unlisted info']").text.split('\n')[3]
                    mfgName = product.find_element_by_tag_name("img").get_attribute("alt")

                    # There are multiple classes, some are "class sale" or else.
                    #We will locate by CSS
                    price = product.find_element_by_css_selector("div.price").text.split('\n')[1]


                    # add details to dictionary
                    productInfo[mfgPart, mfgName, productDescr] = price

                # prints the searched products information   
                for (mfg_part, mfg_OEM, description), price in productInfo.items():
                    mfg_id = mfg_part.split(': ')[1]
                    if mfg_id == i:
                        #Here is where I would write to an Excel file
                        #And where I made attempts as described above
                        print('________________________________________________')
                        print('Part #:', mfg_id)
                        print('Company:', mfg_OEM)
                        print('Description:', description)
                        print('Price:', price)
                        print('________________________________________________')


                #time.sleep(5)
                #driver.close()

            else:
                mfg_id = "Not on Tessco"
                mfg_OEM = "Not on Tessco"
                description = "Not on Tessco"
                price = "Not on Tessco"
                #driver.close()
                print("Item was not found on Tessco.com")


        except Exception as e:
            print('________________________________________________')            
            print(e)
            mfg_id = "Not on Tessco"
            mfg_OEM = "Not on Tessco"
            description = "Not on Tessco"
            price = "Not on Tessco"
            #driver.close()
            print("Item was not found on Tessco.com")
            print('________________________________________________')

driver.close()

标签: excelpython-3.xpandasseleniumdataframe

解决方案


推荐阅读