javascript - 抓取时弹出警告信息
问题描述
我正在使用 selenium 来抓取这个网站: https ://www.fedsdatacenter.com/federal-pay-rates/index.php?y=all&n=&l=&a=&o=
通过继续单击下一步并解析表格直到出现警告消息,我的代码运行良好:
DataTables 警告:table id=table-example - JSON 响应无效。
我的代码因为这个错误而停止。即使手动,单击下一步也会给我同样的警告。
这是我的代码。我能做些什么呢?如果有任何方法可以改进我的代码,请帮助我。
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import TimeoutException
from selenium.common.exceptions import ElementNotVisibleException
from selenium.common.exceptions import StaleElementReferenceException
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import csv
import time
def has_class_onclick(tag):
return tag.has_attr('onclick')
def extract_table_content_into_rows(website_lists):
# This function is to extract all the table content from and put them into a list of row.
list_of_row = []
for table_page in website_lists:
soup_page = BeautifulSoup(table_page, "html.parser")
soup_table_raw = soup_page.find("table")
if soup_table_raw:
soup_table = soup_table_raw.find("tbody")
for soup_row in soup_table.find_all("tr"):
row_content = []
for soup_column in soup_row.find_all("td"):
if not soup_column.contents:
row_content.append(".")
else:
column_content = soup_column.contents[0].strip()
row_content.append(column_content)
list_of_row.append(row_content)
else:
continue
return list_of_row
def csv_writer(lists_of_row):
# This function is to write the table contents into a csv file.
with open("federal.csv", "at", newline="") as csvfile:
for row_to_write in lists_of_row:
writer = csv.writer(csvfile)
writer.writerow(row_to_write)
driver = webdriver.Chrome('chromedriver') # Optional argument, if not specified will search path.
driver.get('https://www.fedsdatacenter.com/federal-pay-rates/index.php?y=all&n=&l=&a=&o=')
driver.find_element_by_xpath('//*[@id="table-example_length"]/label/select').click()
time.sleep(3)
driver.find_element_by_xpath('//*[@id="table-example_length"]/label/select/option[4]').click()
time.sleep(3)
page_num = 1
while page_num > 0 and page_num <= 5:
html = driver.page_source
website_list = [html]
row_list = extract_table_content_into_rows(website_list)
print(row_list)
csv_writer(row_list)
driver.find_element_by_xpath('//*[@id="table-example_next"]/a').click()
time.sleep(3)
print(page_num)
page_num += 1
while page_num > 5:
html = driver.page_source
website_list = [html]
row_list = extract_table_content_into_rows(website_list)
print(row_list)
csv_writer(row_list)
driver.find_element_by_xpath('//*[@id="table-example_next"]/a').click()
not_find = 1
while not_find == 1:
try:
driver.find_element_by_xpath('//*[@id="table-example_paginate"]/ul/li[6]/a')
while driver.find_element_by_xpath('//*[@id="table-example_pagina'
'te"]/ul/li[6]/a').text != str(page_num + 2):
time.sleep(0.1)
not_find = 0
except StaleElementReferenceException:
continue
print(page_num)
page_num += 1
解决方案
一种方法是使用一些 JavaScript 禁用页面上的所有警报:
driver.execute_script('window.alert = function() {};')
推荐阅读
- c++ - 为什么 g++ 编写调用下一条指令的代码?
- python-3.x - python脚本使用vlc从RaspberryPi播放音频
- firebase - 我的颤动代码没有按照我期望的顺序读取代码
- c# - 获取通过 Autofac 模块注册注册的对象的多个实例
- javascript - Javascript - 避免将重复的问题放入数组中
- android - 是否可以在没有流的情况下从 DataStore 读取/写入原始类型?
- python - Django多对多字段,但只显示最高值
- xml - 更优化的 DocumentBuilderFactory XML 读取
- django - 在 DJANGO 中更新模型数据时,数据未预先填充到表单中
- c# - “内容”属性被多次定义