首页 > 解决方案 > 如果找不到元素如何继续 - 使用 Python 进行抓取

问题描述

我正在抓取一个基本上是搜索引擎的页面。一张包含一些客户端代码(称为 CPF)的表格将密钥发送到页面,然后它给了我一些信息,我正在抓取该表格。抓取代码几乎完成了,但我无法处理错误的客户号码。

该页面的工作方式如下:

1-如果客户端代码正常,页面重定向并显示一些我已经可以抓取的信息;

2-如果客户端代码没有全部数字,“搜索”按钮什么也不做;

3- 如果客户端代码包含所有数字但它有问题,页面会显示一个弹出窗口。

在案例 2 和 3 中,我想打印一些东西(CPF Invalido)并转到下一个客户端代码。这是我已经拥有的代码:

        for cpf in self.cpfs:
        print(f"Procurando {cpf}.")

        self.driver.get(self.bot_url)

        cpf_input = self.driver.find_element_by_xpath('//*[@id="search"]/div/div[1]/input')
        cpf_input.send_keys(cpf)

        time.sleep(2)

        cpfButton = self.driver.find_element_by_xpath('//*[@id="search"]/div/div[2]/button')
        cpfButton.click()

        time.sleep(2)

        self.delay = 3  # seconds

        nome = self.driver.find_element_by_xpath("/html/body/main[1]/div[1]/div[1]/div[1]/div[1]/h2").text
        idade = self.driver.find_element_by_xpath("/html/body/main[1]/div[1]/div[1]/div[1]/div[1]/ul/li[2]").text
        age = re.search(r'\((.*?)Anos', idade).group(1)
        beneficio = self.driver.find_element_by_xpath(
            "/html/body/main[1]/div[1]/div[1]/div[1]/div[2]/div[5]/span/b").text
        concessao = self.driver.find_element_by_xpath("/html/body/main[1]/div[1]/div[1]/div[1]/div[2]/div[2]/span").text
        salario = self.driver.find_element_by_xpath(
            "/html/body/main[1]/div[1]/div[2]/div/div[3]/div[1]/div[1]/span").text
        bancos = self.driver.find_element_by_xpath('//*[@id="loans"]').text
        bancosw = re.findall(r'(?<=Banco )(\w+)', bancos)
        bankslist = ', '.join(bancosw)
        bancocard = self.driver.find_element_by_xpath('//*[@id="cards"]').text
        bcardw = re.findall(r'(?<=Banco )(\w+)', bancocard)
        bcardlist = ', '.join(bcardw)
        consig = self.driver.find_element_by_xpath("/html/body/main[1]/div[1]/div[1]/div[3]/div[2]/span").text
        card = self.driver.find_element_by_xpath("/html/body/main[1]/div[1]/div[1]/div[3]/div[3]/span").text

        try:
            WebDriverWait(self.driver, self.delay).until(
                EC.presence_of_element_located((By.XPATH, '//*[@id="main"]/div[1]/h2')))
            print('CPF Valido')

            print(nome, age, beneficio, concessao, salario, bankslist, bcardlist, consig, card)

        except NoSuchElementException:
            print('CPF Invalido')

        nomes.append(nome)
        idades.append(age)
        beneficios.append(beneficio)
        concessoes.append(concessao)
        salarios.append(salario)
        bancoss.append(bankslist)
        bancoscard.append(bcardlist)
        consigs.append(consig)
        cards.append(card)

    return nomes, idades, beneficios, concessoes, salarios, bancoss, bancoscard, consigs, cards

我将 try 与客户端代码正确时显示的页面元素一起使用,因此除了 NoSuchElementException 应该打印 CPF Invalido 并继续代码,搜索其他客户端代码。

在情况 2 中,错误是:

Traceback (most recent call last):
  File "C:/Users/MOISA/PycharmProjects/inss2/cpf_updater.py", line 47, in <module>
    cpf_updater.process_cpf_list()
  File "C:/Users/MOISA/PycharmProjects/inss2/cpf_updater.py", line 32, in process_cpf_list
    nomes, idades, beneficios, concessoes, salarios, bancoss, bancoscard, consigs, cards = bot_url.search_cpfs()
  File "C:\Users\MOISA\PycharmProjects\inss2\k_bot.py", line 66, in search_cpfs
    nome = self.driver.find_element_by_xpath("/html/body/main[1]/div[1]/div[1]/div[1]/div[1]/h2").text
  File "C:\Users\MOISA\PycharmProjects\inss2\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 394, in find_element_by_xpath
    return self.find_element(by=By.XPATH, value=xpath)
  File "C:\Users\MOISA\PycharmProjects\inss2\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
    'value': value})['value']
  File "C:\Users\MOISA\PycharmProjects\inss2\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "C:\Users\MOISA\PycharmProjects\inss2\venv\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: /html/body/main[1]/div[1]/div[1]/div[1]/div[1]/h2

在案例 3 中,它给出:

Traceback (most recent call last):
  File "C:/Users/MOISA/PycharmProjects/inss2/cpf_updater.py", line 47, in <module>
    cpf_updater.process_cpf_list()
  File "C:/Users/MOISA/PycharmProjects/inss2/cpf_updater.py", line 32, in process_cpf_list
    nomes, idades, beneficios, concessoes, salarios, bancoss, bancoscard, consigs, cards = bot_url.search_cpfs()
  File "C:\Users\MOISA\PycharmProjects\inss2\k_bot.py", line 66, in search_cpfs
    nome = self.driver.find_element_by_xpath("/html/body/main[1]/div[1]/div[1]/div[1]/div[1]/h2").text
  File "C:\Users\MOISA\PycharmProjects\inss2\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 394, in find_element_by_xpath
    return self.find_element(by=By.XPATH, value=xpath)
  File "C:\Users\MOISA\PycharmProjects\inss2\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
    'value': value})['value']
  File "C:\Users\MOISA\PycharmProjects\inss2\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "C:\Users\MOISA\PycharmProjects\inss2\venv\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 241, in check_response
    raise exception_class(message, screen, stacktrace, alert_text)
selenium.common.exceptions.UnexpectedAlertPresentException: Alert Text: None
Message: Dismissed user prompt dialog: Nenhum benefício foi localizado para este CPF.

这是 cpf_updater

    def process_cpf_list(self):
            cpfs = self.sheet.col_values(self.cpf_col)[1:]

            bot_url = BOT(cpfs)
            try:
                nomes, idades, beneficios, concessoes, salarios, bancoss, bancoscard, consigs, cards = bot_url.search_cpfs()
                print("Atualizando...")
                for i in range(len(nomes)):
                        self.sheet.update_cell(i + 2, self.nome_col, nomes[i])
                        self.sheet.update_cell(i + 2, self.age_col, idades[i])
                        self.sheet.update_cell(i + 2, self.beneficio_col, beneficios[i])
                        self.sheet.update_cell(i + 2, self.concessao_col, concessoes[i])
                        self.sheet.update_cell(i + 2, self.salario_col, salarios[i])
                        self.sheet.update_cell(i + 2, self.bancos_col, bancoss[i])
                        self.sheet.update_cell(i + 2, self.bancocard_col, bancoscard[i])
                        self.sheet.update_cell(i + 2, self.consig_col, consigs[i])
                        self.sheet.update_cell(i + 2, self.card_col, cards[i])

            except NoSuchElementException:
                print('CPF Invalido')
                pass

cpf_updater = CpfSearch('TESTE')
cpf_updater.process_cpf_list()

标签: pythonweb-scrapingtry-catchexceptnosuchelementexception

解决方案


对于案例 2:

您在 cpf_updater.py 的第 47 行收到 NoSuchElementException。您应该将相关部分包含在 try except 中,并处理 NoSuchElementException。

对于案例 3:在同一行,您还应该处理 UnexpectedAlertPresentException。当您收到一个模态框、一些弹出窗口或一个警报时,通常会发生此异常。

我不完全确定哪一行对应于 cpf_updater.py 的第 47 行,但这就是问题所在。

编辑:除了上述两个例外,您似乎需要尝试以下内容。我认为该错误是在第一行的函数调用中引起的。结果变量取决于该调用。

nomes, idades, beneficios, concessoes, salarios, bancoss, bancoscard, consigs, cards = bot_url.search_cpfs()

        print("Atualizando...")
        for i in range(len(nomes)):
            self.sheet.update_cell(i+2, self.nome_col, nomes[i])
            self.sheet.update_cell(i+2, self.age_col, idades[i])
            self.sheet.update_cell(i+2, self.beneficio_col, beneficios[i])
            self.sheet.update_cell(i+2, self.concessao_col, concessoes[i])
            self.sheet.update_cell(i+2, self.salario_col, salarios[i])
            self.sheet.update_cell(i + 2, self.bancos_col, bancoss[i])
            self.sheet.update_cell(i + 2, self.bancocard_col, bancoscard[i])
            self.sheet.update_cell(i+2, self.consig_col, consigs[i])
            self.sheet.update_cell(i+2, self.card_col, cards[i])

推荐阅读