python - Selenium 悬停在不正确的 FirefoxWebElement 而不是指定的元素上
问题描述
我正在尝试从互联网档案馆获取一系列日期中每天华盛顿邮报首页的最后一张快照的时间。问题是 Selenium 并不总是选择正确的日期,尽管我收集了一个日期对象列表并且它似乎没有错误。例如,Selenium 将从 1 月 31 日跳转到 2 月 11 日,而不是 2 月 1 日。
打印输出:
Moving to 31
Date: 2020-01-31 00:00:00
Last snapshot taken at 22:59:03
Moving to 1
Date: 2020-02-11 00:00:00
Last snapshot taken at 23:53:32
Moving to 2
Date: 2020-02-02 00:00:00
Last snapshot taken at 23:59:56
在此输出中,我们可以看到它应该变为 1,并且日期对象的文本显示为 1,这意味着从页面中正确提取了日期对象。硒没有正确地悬停在它上面吗?
完整代码:
urls = ['https://web.archive.org/web/20190901000000*/washingtonpost.com',
'https://web.archive.org/web/*/washingtonpost.com']
browser = webdriver.Firefox(executable_path='/usr/local/bin/geckodriver') # brew install chromedriver then see in terminal where it was installed to and paste this
data = {}
for j in range(0, len(urls)):
browser.get(urls[j])
calendar_grid = WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.CLASS_NAME, 'calendar-grid')))
if j == 0: # 2019
browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(10)
start = len(dates) - 1
end = 0
step = -1
elif j == 1: # 2020
start = 0
end = len(dates)
step = 1
dates = calendar_grid.find_elements_by_css_selector('.calendar-day')
print('Dates on page: ' + str(len(dates)))
for i in range(start, end, step):
if j==0 and len(data) == len(desired_ranges[j]): # to end 2019
break
# Hover over the date, let popup appear, wait for loader to disappear, select scroll area
print('Moving to ' + dates[i])
hov = ActionChains(browser).move_to_element(dates[i])
hov.perform()
popup = WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.popup-of-day-content')))
WebDriverWait(browser, 20).until(EC.invisibility_of_element_located((By.TAG_NAME, 'svg')))
scroll_area = WebDriverWait(browser, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.popup-of-day-content > ul > div')))
# Get date and check that it is in our range
date = popup.find_element_by_class_name('day-tooltip-title')
date_formatted = datetime.datetime.strptime(date.text, '%B %d, %Y')
print('Date: ' + str(date_formatted))
if date_formatted not in desired_ranges[j]:
continue # skip if it is not
else:
attempts = 0
while attempts < 5:
try:
browser.execute_script('arguments[0].scrollTop = arguments[0].scrollHeight', scroll_area)
snapshots = popup.find_elements_by_tag_name('a')
last_snapshot = snapshots[len(snapshots) - 1]
print('Last snapshot taken at ' + last_snapshot.text)
data[date_formatted] = {'link': last_snapshot.get_attribute('href'),
'time': last_snapshot.text,
'headlines': []}
break
except StaleElementReferenceException:
attempts += 1
设置日期范围的额外代码:
start_day = datetime.date(2019, 12, 8)
end_day = datetime.date.today()
days = (end_day - start_day).days
desired_range = pd.date_range(start_day, periods=days).tolist()
print('Range: ' + str(start_day) + ' to ' + str(end_day))
print('Days: ' + str(days))
desired_range
def time_in_range(start, end, x):
'''Return true if x is in the range [start, end]'''
if start <= end:
return start <= x <= end
else:
return start <= x or x <= end
#Get 2019 date range
desired_range_in_2019 = [x for x in desired_range if time_in_range(datetime.date(2019, 1, 1), datetime.date(2019, 12, 31), x)]
desired_range_in_2020 = [x for x in desired_range if time_in_range(datetime.date(2020, 1, 1), datetime.date(2020, 12, 31), x)]
desired_ranges = [desired_range_in_2019, desired_range_in_2020]
print('Dates in 2019: ' + str(len(desired_range_in_2019)))
print('Dates in 2020: ' + str(len(desired_range_in_2020)))
解决方案
发生这种情况很可能是因为ActionChains
模拟了用户鼠标的移动,因此当您的预期条件出现时,光标 - 在前往适当元素的途中 - 会偶然发现其他元素。通过网络限制可以可靠地重现该问题。将开发者工具中的网络更改为常规 3G,您将能够观察到 2 月 11 日和 4 月 5 日案例的问题。
有一些创造性的方法可以修改光标路径以消除问题(更改浏览器窗口宽度,按顺序遍历空日历元素等),但最可靠的方法是将光标完全排除在游戏之外。将悬停逻辑替换为以下代码段。
# Hover over the date, let popup appear, wait for loader to disappear, select scroll area
print('Moving to ' + str(dates[i]))
browser.execute_script("""
arguments[0].addEventListener('mouseover', function() {
});
var event = new MouseEvent('mouseover', {
'view': window,
'bubbles': true,
'cancelable': true
});
arguments[0].dispatchEvent(event);""", dates[i])
推荐阅读
- android - 计算器四舍五入以适合editText
- android-studio - 错误:未解决的参考:setContentView
- ansible - 如何使ansible电子邮件正文断行和大字体
- php - LARAVEL 8:一般错误:使用外键运行迁移时发生 1005
- javascript - 数组是什么意思
()? - assembly - x86 汇编 (AT&T):如何在运行时为变量动态分配内存?
- javascript - TypeError:无法读取 React js 中未定义的属性“拆分”
- jquery - MustacheJS 有条件地显示 DOM
- appkit - 在 Big Sur 中隐藏默认的 NSOutlineView 展开/折叠箭头
- google-app-engine - 如何在 Google App Engine 中过滤和查看每日日志?