python - 如果找不到元素如何继续 - 使用 Python 进行抓取
问题描述
我正在抓取一个基本上是搜索引擎的页面。一张包含一些客户端代码(称为 CPF)的表格将密钥发送到页面,然后它给了我一些信息,我正在抓取该表格。抓取代码几乎完成了,但我无法处理错误的客户号码。
该页面的工作方式如下:
1-如果客户端代码正常,页面重定向并显示一些我已经可以抓取的信息;
2-如果客户端代码没有全部数字,“搜索”按钮什么也不做;
3- 如果客户端代码包含所有数字但它有问题,页面会显示一个弹出窗口。
在案例 2 和 3 中,我想打印一些东西(CPF Invalido)并转到下一个客户端代码。这是我已经拥有的代码:
for cpf in self.cpfs:
print(f"Procurando {cpf}.")
self.driver.get(self.bot_url)
cpf_input = self.driver.find_element_by_xpath('//*[@id="search"]/div/div[1]/input')
cpf_input.send_keys(cpf)
time.sleep(2)
cpfButton = self.driver.find_element_by_xpath('//*[@id="search"]/div/div[2]/button')
cpfButton.click()
time.sleep(2)
self.delay = 3 # seconds
nome = self.driver.find_element_by_xpath("/html/body/main[1]/div[1]/div[1]/div[1]/div[1]/h2").text
idade = self.driver.find_element_by_xpath("/html/body/main[1]/div[1]/div[1]/div[1]/div[1]/ul/li[2]").text
age = re.search(r'\((.*?)Anos', idade).group(1)
beneficio = self.driver.find_element_by_xpath(
"/html/body/main[1]/div[1]/div[1]/div[1]/div[2]/div[5]/span/b").text
concessao = self.driver.find_element_by_xpath("/html/body/main[1]/div[1]/div[1]/div[1]/div[2]/div[2]/span").text
salario = self.driver.find_element_by_xpath(
"/html/body/main[1]/div[1]/div[2]/div/div[3]/div[1]/div[1]/span").text
bancos = self.driver.find_element_by_xpath('//*[@id="loans"]').text
bancosw = re.findall(r'(?<=Banco )(\w+)', bancos)
bankslist = ', '.join(bancosw)
bancocard = self.driver.find_element_by_xpath('//*[@id="cards"]').text
bcardw = re.findall(r'(?<=Banco )(\w+)', bancocard)
bcardlist = ', '.join(bcardw)
consig = self.driver.find_element_by_xpath("/html/body/main[1]/div[1]/div[1]/div[3]/div[2]/span").text
card = self.driver.find_element_by_xpath("/html/body/main[1]/div[1]/div[1]/div[3]/div[3]/span").text
try:
WebDriverWait(self.driver, self.delay).until(
EC.presence_of_element_located((By.XPATH, '//*[@id="main"]/div[1]/h2')))
print('CPF Valido')
print(nome, age, beneficio, concessao, salario, bankslist, bcardlist, consig, card)
except NoSuchElementException:
print('CPF Invalido')
nomes.append(nome)
idades.append(age)
beneficios.append(beneficio)
concessoes.append(concessao)
salarios.append(salario)
bancoss.append(bankslist)
bancoscard.append(bcardlist)
consigs.append(consig)
cards.append(card)
return nomes, idades, beneficios, concessoes, salarios, bancoss, bancoscard, consigs, cards
我将 try 与客户端代码正确时显示的页面元素一起使用,因此除了 NoSuchElementException 应该打印 CPF Invalido 并继续代码,搜索其他客户端代码。
在情况 2 中,错误是:
Traceback (most recent call last):
File "C:/Users/MOISA/PycharmProjects/inss2/cpf_updater.py", line 47, in <module>
cpf_updater.process_cpf_list()
File "C:/Users/MOISA/PycharmProjects/inss2/cpf_updater.py", line 32, in process_cpf_list
nomes, idades, beneficios, concessoes, salarios, bancoss, bancoscard, consigs, cards = bot_url.search_cpfs()
File "C:\Users\MOISA\PycharmProjects\inss2\k_bot.py", line 66, in search_cpfs
nome = self.driver.find_element_by_xpath("/html/body/main[1]/div[1]/div[1]/div[1]/div[1]/h2").text
File "C:\Users\MOISA\PycharmProjects\inss2\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 394, in find_element_by_xpath
return self.find_element(by=By.XPATH, value=xpath)
File "C:\Users\MOISA\PycharmProjects\inss2\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
'value': value})['value']
File "C:\Users\MOISA\PycharmProjects\inss2\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\MOISA\PycharmProjects\inss2\venv\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: /html/body/main[1]/div[1]/div[1]/div[1]/div[1]/h2
在案例 3 中,它给出:
Traceback (most recent call last):
File "C:/Users/MOISA/PycharmProjects/inss2/cpf_updater.py", line 47, in <module>
cpf_updater.process_cpf_list()
File "C:/Users/MOISA/PycharmProjects/inss2/cpf_updater.py", line 32, in process_cpf_list
nomes, idades, beneficios, concessoes, salarios, bancoss, bancoscard, consigs, cards = bot_url.search_cpfs()
File "C:\Users\MOISA\PycharmProjects\inss2\k_bot.py", line 66, in search_cpfs
nome = self.driver.find_element_by_xpath("/html/body/main[1]/div[1]/div[1]/div[1]/div[1]/h2").text
File "C:\Users\MOISA\PycharmProjects\inss2\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 394, in find_element_by_xpath
return self.find_element(by=By.XPATH, value=xpath)
File "C:\Users\MOISA\PycharmProjects\inss2\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
'value': value})['value']
File "C:\Users\MOISA\PycharmProjects\inss2\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\MOISA\PycharmProjects\inss2\venv\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 241, in check_response
raise exception_class(message, screen, stacktrace, alert_text)
selenium.common.exceptions.UnexpectedAlertPresentException: Alert Text: None
Message: Dismissed user prompt dialog: Nenhum benefício foi localizado para este CPF.
这是 cpf_updater
def process_cpf_list(self):
cpfs = self.sheet.col_values(self.cpf_col)[1:]
bot_url = BOT(cpfs)
try:
nomes, idades, beneficios, concessoes, salarios, bancoss, bancoscard, consigs, cards = bot_url.search_cpfs()
print("Atualizando...")
for i in range(len(nomes)):
self.sheet.update_cell(i + 2, self.nome_col, nomes[i])
self.sheet.update_cell(i + 2, self.age_col, idades[i])
self.sheet.update_cell(i + 2, self.beneficio_col, beneficios[i])
self.sheet.update_cell(i + 2, self.concessao_col, concessoes[i])
self.sheet.update_cell(i + 2, self.salario_col, salarios[i])
self.sheet.update_cell(i + 2, self.bancos_col, bancoss[i])
self.sheet.update_cell(i + 2, self.bancocard_col, bancoscard[i])
self.sheet.update_cell(i + 2, self.consig_col, consigs[i])
self.sheet.update_cell(i + 2, self.card_col, cards[i])
except NoSuchElementException:
print('CPF Invalido')
pass
cpf_updater = CpfSearch('TESTE')
cpf_updater.process_cpf_list()
解决方案
对于案例 2:
您在 cpf_updater.py 的第 47 行收到 NoSuchElementException。您应该将相关部分包含在 try except 中,并处理 NoSuchElementException。
对于案例 3:在同一行,您还应该处理 UnexpectedAlertPresentException。当您收到一个模态框、一些弹出窗口或一个警报时,通常会发生此异常。
我不完全确定哪一行对应于 cpf_updater.py 的第 47 行,但这就是问题所在。
编辑:除了上述两个例外,您似乎需要尝试以下内容。我认为该错误是在第一行的函数调用中引起的。结果变量取决于该调用。
nomes, idades, beneficios, concessoes, salarios, bancoss, bancoscard, consigs, cards = bot_url.search_cpfs()
print("Atualizando...")
for i in range(len(nomes)):
self.sheet.update_cell(i+2, self.nome_col, nomes[i])
self.sheet.update_cell(i+2, self.age_col, idades[i])
self.sheet.update_cell(i+2, self.beneficio_col, beneficios[i])
self.sheet.update_cell(i+2, self.concessao_col, concessoes[i])
self.sheet.update_cell(i+2, self.salario_col, salarios[i])
self.sheet.update_cell(i + 2, self.bancos_col, bancoss[i])
self.sheet.update_cell(i + 2, self.bancocard_col, bancoscard[i])
self.sheet.update_cell(i+2, self.consig_col, consigs[i])
self.sheet.update_cell(i+2, self.card_col, cards[i])
推荐阅读
- flutter - 如何跳转到列表中的某个部分?
- javascript - Greasemonkey 用户脚本未正确加载
- python - python - 读取txt文件并检查文件是否为空
- c++ - G++ 编译器在每次运行后返回相同的变量地址(即使在名称更改后)
- python - 在 docker 中安装 requirements.txt - 网络问题
- nopcommerce - NopCommerce 缺少参考插件
- json - 邮递员测试 Web API 以获取 oauth 令牌。但总是得到 400 BAD REQUEST 响应
- postgresql - Postgres 复制槽显示不活动
- c++ - Qt 和 Qwt 在 ubuntu 18.04 下部署
- f# - 负载平衡请求/农场请求(并发和状态 - 尴尬)