python - 如何使用 selenium chrome Web 驱动程序自动化登录凭据
问题描述
我正在尝试从 [this][1] 网站提取数据:
手动程序是在搜索框中输入“CCOCCO”等字符串,单击“预测属性”并从表中记录“玻璃化转变温度 (K)”。
如果 html POST 的数量小于 5,以下代码将自动执行上述任务:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options=Options()
options.add_argument('start-maximized')
options.add_argument('disable-infobars')
options.add_argument('--disable-extensions')
driver=webdriver.Chrome(chrome_options=options)
def get_glass_temperature(smiles):
driver.get('https://www.polymergenome.org/explore/index.php?m=1')
x_path_click="//input[@class='large_input_no_round ui-autocomplete-input' and @id='keyword_original']"
x_path_find="//input[@class='dark_blue_button_no_round' and @value='Predict Properties']"
x_path_get="//table[@class='record']//tbody/tr[@class='record']//following::td[7]/center/font/font"
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, x_path_click))).send_keys(smiles)
driver.find_element_by_xpath(x_path_find).click()
return WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH,x_path_get))).get_attribute("innerHTML")
我将上述函数应用于具有类似于“CCOCCO”的字符串的 tp 400 值的 pandas 数据帧。但是,在返回 5 "Glass Temperature" 后会出现 WebdriverException 错误,因为网站会抛出以下消息:
"Visits of more than 5 times per day to the property prediction capability requires login. "
在运行代码之前,我登录网站并选中“记住我”框,但错误是一样的。
我试图修改代码如下:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
import os
options=Options()
options.add_argument('start-maximized')
options.add_argument('disable-infobars')
options.add_argument('--disable-extensions')
driver=webdriver.Chrome(chrome_options=options, executable_path='/Users/ae/Downloads/chromedriver')
def get_glass_temperature(smiles):
driver.get('https://www.polymergenome.org/explore/index.php?m=1')
user_name='my_user_name'
password='my_password'
x_path_id="//input[@class='large_input_no_round' and @placeholder='User ID']"
x_path_pass="//input[@class='large_input_no_round' and @placeholder='Password']"
x_path_sign="//input[@class='orange_button_no_round' and @value='Sign In']"
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, x_path_id))).send_keys(user_name)
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, x_path_pass))).send_keys(password)
driver.find_element_by_xpath(x_path_sign).click()
x_path_click="//input[@class='large_input_no_round ui-autocomplete-input' and @id='keyword_original']"
x_path_find="//input[@class='dark_blue_button_no_round' and @value='Predict Properties']"
x_path_get="//table[@class='record']//tbody/tr[@class='record']//following::td[7]/center/font/font"
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, x_path_click))).send_keys(smiles)
driver.find_element_by_xpath(x_path_find).click()
return WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH,x_path_get))).get_attribute("innerHTML")
test_smiles=['CC(F)(F)CC(F)(F)','CCCCC(=O)OCCOC(=O)','CNS-C6H3-CSN-C6H3','CCOCCO','NH-CS-NH-C6H4','C4H8','C([*])C([*])(COOc1cc(Cl)ccc1)']
test_polymer=pd.DataFrame({'SMILES': test_smiles})
test_polymer['test_tg']=test_polymer['SMILES'].apply(get_glass_temperature)
print (test_polymer)
修改后,我收到超时错误:
Traceback (most recent call last):
File "/Users/alieftekhari/Desktop/extract_TG.py", line 42, in <module>
test_polymer['test_tg']=test_polymer['SMILES'].apply(get_glass_temperature)
File "/anaconda/lib/python2.7/site-packages/pandas/core/series.py", line 3194, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/_libs/src/inference.pyx", line 1472, in pandas._libs.lib.map_infer
File "/Users/user/Desktop/extract_TG.py", line 22, in get_glass_temperature
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, x_path_id))).send_keys(user_name)
File "/anaconda/lib/python2.7/site-packages/selenium/webdriver/support/wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
[1]: https://www.polymergenome.org/explore/index.php?m=1
解决方案
查看堆栈跟踪的最后一行File "/anaconda/lib/python2.7/site-packages/selenium/webdriver/support/wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
它清楚地提到没有这样的元素,这就是它给出 TimeoutException 的原因。我在这里看到的,你的 xpath 是错误的..
x_path_id="//input[@class='large_input_no_round ui-autocomplete-input' and @placeholder='User ID']"
x_path_pass="//input[@class='large_input_no_round ui-autocomplete-input' and @placeholder='Password']"
没有类large_input_no_round ui-autocomplete-input
,所以用正确的类修改 xpath,如下所示..
x_path_id="//input[@class='large_input_no_round' and @placeholder='User ID']"
x_path_pass="//input[@class='large_input_no_round' and @placeholder='Password']"
问题
driver.get('https://www.polymergenome.org/explore/index.php?m=1')
此页面没有登录窗口,因此出现 TimeoutExceptionWebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, x_path_id))).send_keys(user_name)
换句话说,当您运行脚本时,它会启动一个新的浏览器实例,意味着您之前的登录已经消失,现在您需要登录才能通过此限制
Visits of more than 5 times per day to the property prediction capability requires login.
;并且登录窗口将在 5 次成功提取迭代后填充,这里的脚本失败是因为它试图直接登录而不等待登录对话框,并且由于没有登录窗口,它给出了 TimeoutException。
解决方案是你应该将提取数据部分放入try块并登录到catch,只有在提取数据出现异常时才会执行登录部分。我的 Java 实现是这样的,
@Test(invocationCount = 7)
public void getList(){
wait = new WebDriverWait(driver, 20);
By locator = By.xpath("//table[@class='record']//tbody/tr[@class='record']//following::td[7]/center/font/font");
try {
driver.findElement(By.xpath("//input[@class='large_input_no_round ui-autocomplete-input' and @id='keyword_original']")).clear();
driver.findElement(By.xpath("//input[@class='large_input_no_round ui-autocomplete-input' and @id='keyword_original']")).sendKeys("CCOCCO");
driver.findElement(By.xpath("//input[@class='dark_blue_button_no_round' and @value='Predict Properties']")).click();
String text = wait.until(ExpectedConditions.visibilityOfElementLocated(locator)).getAttribute("innerHTML");
System.out.println(text);
}catch(Exception e){
System.out.println("In Exception Block");
wait.until(ExpectedConditions.elementToBeClickable(By.xpath("//input[@class='large_input_no_round' and @placeholder='User ID']")));
driver.findElement(By.xpath("//input[@class='large_input_no_round' and @placeholder='User ID']")).sendKeys("testing");
driver.findElement(By.xpath("//input[@class='large_input_no_round' and @placeholder='Password']")).sendKeys("testing");
driver.findElement(By.xpath("//input[@class='orange_button_no_round' and @value='Sign In']")).click();
}
}
其他方式
- 最好的方法是浏览网站,导航到登录对话框,然后登录,成功登录后,浏览搜索页面并继续提取。
- 或者您可以在登录前设置 5 个限制(意味着提取 5 次)。
推荐阅读
- kubernetes - 使用 kubectl delete 命令删除 core-dns pod 被阻止/无活动
- mysql - 部署模板解析失败:'解析值时遇到意外字符:<。路径'',第 0 行,位置 0。'。(代码:无效模板)
- postgresql - 如何在 Postgresql 中选择与 json 类型不匹配的值?
- hyperledger-fabric - 块数据的结构内部
- c++ - 如何使用 enable_if 进行模板类成员的离线定义
- ffmpeg - ffmpeg 图像转换适用于诺基亚构象 heic 文件,但不适用于 iphone heic 图像
- pytorch - 我应该将 csv 测试文件放在 PyTorch 数据加载器中的什么位置?
- android - 如何防止 PWA 应用程序中的 Android 模拟位置?
- postgresql - TimeScaleDB Hypertable 上的自定义索引
- javascript - 如何从 2 个 javascript 数组中获取唯一记录?