excel - Python、Selenium、Pandas DataFrame 和 Excel
问题描述
我无法拼凑拼图的最后一部分。整个代码如下所示,其中包括我正在抓取数据的站点的非必要用户名和密码。
在循环使用 Excel 文件中的零件号后
pd.read_excel()
Selenium 用于抓取相关网站的各种项目;然后代码成功地将这些值写入输出窗口。
与将数据写入输出窗口相反,我的目标是写入要从中提取数据的同一个 Excel 文件,并将其写入相应的列。
在代码的最后一个for
循环中,我最初尝试通过附加将变量(正在打印到屏幕上)写入 Excel
.to_excel('filePathHere')
到有问题的变量。例如,我尝试
description.to_excel('pathToFile/output.xlsx')
这会产生错误EOL while scanning string literal (<string>, line 1)
然后我想,也许这个变量需要转换为DataFrame,所以我尝试了
description_DataFrame = pd.DataFrame(description)
description_DataFrame.to_excel('pathToFile/output.xlsx')
这导致了相同的错误消息。
我什至不确定这是否是将每个项目写入现有(或新)文件的正确逻辑。如果是,我在这里找到了关于如何处理长字符串的解释:StackOverFlow EOL 错误,但我的数据都不是长字符串,所以我看不出它是如何应用的。
然后我开始认为我可能需要创建一个字典,然后附加到它。所以我然后从上面删除了任何尝试并尝试:
description = []
description.append(mfg_part)
mfg_part.to_excel('pathToFile/output.xlsx')
这仍然给我同样的 EOL 错误。
我不确定出了什么问题,以及为什么我不能将变量写入mfg_part, mfg_OEM, description
加载的 Excel 文件中它们各自的列。
任何提示/提示将不胜感激。
完整的工作代码,打印到屏幕如下:
import time
#Need Selenium for interacting with web elements
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
#Need numpy/pandas to interact with large datasets
import numpy as np
import pandas as pd
import itertools
# load in manufacture part number from a collection of components, via an Excel file
mfg_id_list = pd.read_excel("C:/Users/James/Documents/Python Scripts/jupyterNoteBooks/ScrapingData/MasterQuoteTemplate.xls")['Model']
# Create a dictionary to store product and price
# While the below works just fine, we want to create en empty pandas dataframe, so we can output to Excel later
productInfo = {}
chrome_path = r"C:\Users\James\Documents\Python Scripts\jupyterNoteBooks\ScrapingData\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.maximize_window()
driver.get("https://www.tessco.com/login")
userName = "FirstName.SurName321123@gmail.com"
password = "PasswordForThis123"
#Set a wait, for elements to load into the DOM
wait10 = WebDriverWait(driver, 10)
wait20 = WebDriverWait(driver, 20)
wait30 = WebDriverWait(driver, 30)
elem = wait10.until(EC.element_to_be_clickable((By.ID, "userID")))
elem.send_keys(userName)
elem = wait10.until(EC.element_to_be_clickable((By.ID, "password")))
elem.send_keys(password)
#Press the login button
driver.find_element_by_xpath("/html/body/account-login/div/div[1]/form/div[6]/div/button").click()
for i in mfg_id_list:
#Expand the search bar
searchBar = wait10.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#searchBar input")))
#Enter information into the search bar
#If cell is not blank
if len(str(i)) != 0:
searchBar.send_keys(Keys.CONTROL, 'a')
searchBar.send_keys(i)
driver.find_element_by_css_selector('a.inputButton').click()
time.sleep(5)
try:
# wait for the products information to be loaded
products = wait10.until(EC.presence_of_all_elements_located((By.XPATH,"//div[@class='CoveoResult']")))
#isProductsThere = driver.find_element_by_xpath("//div[@class='CoveoResult']")
if products:
# iterate through all products in the search result and add details to dictionary
for product in products:
# get product info such as OEM, Description and Part Number
productDescr = product.find_element_by_xpath(".//a[@class='productName CoveoResultLink hidden-xs']").text
mfgPart = product.find_element_by_xpath(".//ul[@class='unlisted info']").text.split('\n')[3]
mfgName = product.find_element_by_tag_name("img").get_attribute("alt")
# There are multiple classes, some are "class sale" or else.
#We will locate by CSS
price = product.find_element_by_css_selector("div.price").text.split('\n')[1]
# add details to dictionary
productInfo[mfgPart, mfgName, productDescr] = price
# prints the searched products information
for (mfg_part, mfg_OEM, description), price in productInfo.items():
mfg_id = mfg_part.split(': ')[1]
if mfg_id == i:
#Here is where I would write to an Excel file
#And where I made attempts as described above
print('________________________________________________')
print('Part #:', mfg_id)
print('Company:', mfg_OEM)
print('Description:', description)
print('Price:', price)
print('________________________________________________')
#time.sleep(5)
#driver.close()
else:
mfg_id = "Not on Tessco"
mfg_OEM = "Not on Tessco"
description = "Not on Tessco"
price = "Not on Tessco"
#driver.close()
print("Item was not found on Tessco.com")
except Exception as e:
print('________________________________________________')
print(e)
mfg_id = "Not on Tessco"
mfg_OEM = "Not on Tessco"
description = "Not on Tessco"
price = "Not on Tessco"
#driver.close()
print("Item was not found on Tessco.com")
print('________________________________________________')
driver.close()
解决方案
推荐阅读
- python - 根据二叉树逐级制作嵌套列表
- react-native - 从 React Native 中的不同文件导入函数
- java - 是每个 Spring 应用程序上下文还是每个 JVM 的类路径?
- python - Tkinter 自动滚动
- javascript - 我应该添加 useCallback 或 useMemo 到我的效果吗?
- angular - 将 HttpErrorResponse 值解析为 JSON 对象
- jenkins-pipeline - Jenkins pipeline Lockable Resources 插件:如何根据定义的标签锁定特定资源
- javascript - 如何在 mathjs.evaluate 中处理 NaN 值
- ruby-on-rails - 如何在 ES5 中实现 PureComponent
- c# - 如何将数据发布到 wordpress woocommerce api?