python - Selenium 滚动困境
问题描述
这是我能找到的唯一可以向下滚动到页面末尾的代码,没有其他任何工作。问题是 While True 语句永远不会完成,它会继续尝试向下滚动,即使在它到达底部之后也不会进入下一步的打印。如何结束 While True 语句并打印结果?谢谢
from selenium import webdriver
url = 'http://www.tradingview.com/screener'
driver = webdriver.Firefox()
driver.get(url)
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# will give a list of all tickers
tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol')
for index in range(len(tickers)):
print("Row " + tickers[index].text + " ")
Errors I'm receiving
>>> from selenium import webdriver
>>> url = 'http://www.tradingview.com/screener'
>>> driver = webdriver.Firefox()
>>> driver.get(url)
>>>
>>> # Get scroll height
... last_height = driver.execute_script("return document.body.scrollHeight")
>>>
>>> selector = '.js-field-total.tv-screener-table__field-value--total'
>>> matches = driver.find_element_by_css_selector(selector)
>>> matches = int(matches.text.split()[0])
>>>
>>> visible_rows = 0
>>> scrolls = 0
>>>
>>> while visible_rows < matches:
...
File "<stdin>", line 2
^
IndentationError: expected an indented block
>>> driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
File "<stdin>", line 1
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
^
IndentationError: unexpected indent
>>>
>>> # Wait 10 scrolls before updating row information
... if scrolls == 10:
File "<stdin>", line 2
if scrolls == 10:
^
IndentationError: unexpected indent
>>> table = driver.find_elements_by_class_name('tv-data-table__tbody')
File "<stdin>", line 1
table = driver.find_elements_by_class_name('tv-data-table__tbody')
^
IndentationError: unexpected indent
>>> visible_rows = len(table[1].find_elements_by_tag_name('tr'))
File "<stdin>", line 1
visible_rows = len(table[1].find_elements_by_tag_name('tr'))
^
IndentationError: unexpected indent
>>> scrolls = 0
File "<stdin>", line 1
scrolls = 0
^
IndentationError: unexpected indent
>>>
>>> scrolls += 1
File "<stdin>", line 1
scrolls += 1
^
IndentationError: unexpected indent
>>>
>>> # will give a list of all tickers
... tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol')
>>>
>>> for index in range(len(tickers)):
... print("Row " + tickers[index].text + " ")
...
解决方案
在代码下方,它告诉您表格中有多少行(匹配项)。因此,一种选择是将可见行数与总行数进行比较。当您达到该数量(可见行数)时,您退出循环。
url = 'http://www.tradingview.com/screener'
driver = webdriver.Firefox()
driver.get(url)
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
selector = '.js-field-total.tv-screener-table__field-value--total'
matches = driver.find_element_by_css_selector(selector)
matches = int(matches.text.split()[0])
visible_rows = 0
scrolls = 0
while visible_rows < matches:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait 10 scrolls before updating row information
if scrolls == 10:
table = driver.find_elements_by_class_name('tv-data-table__tbody')
visible_rows = len(table[1].find_elements_by_tag_name('tr'))
scrolls = 0
scrolls += 1
# will give a list of all tickers
tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol')
for index in range(len(tickers)):
print("Row " + tickers[index].text + " ")
Edit: Since your setup doesn't seem to allow the previous solution, here's a different approach you can try. The page loads 150 rows at a time. So, instead of counting the number of visible rows, we can use the total matches/rows we're expecting (e.g. 4894) and divide that by 150 to get the number of times we need to scroll. If we scroll at least that many times, in theory, all of the rows should be visible and we can continue with the code.
from time import sleep
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
url = 'http://www.tradingview.com/screener'
driver = webdriver.Chrome('./chromedriver')
driver.get(url)
try:
selector = '.js-field-total.tv-screener-table__field-value--total'
condition = EC.visibility_of_element_located((By.CSS_SELECTOR, selector))
matches = WebDriverWait(driver, 10).until(condition)
matches = int(matches.text.split()[0])
except (TimeoutException, Exception):
print ('Problem finding matches, setting default...')
matches = 4895 # Set default
# The page loads 150 rows at a time; divide matches by
# 150 to determine the number of times we need to scroll;
# add 5 extra scrolls just to be sure
num_loops = int(matches / 150 + 5)
for _ in range(num_loops):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
sleep(2) # Pause briefly to allow loading time
# will give a list of all tickers
tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol')
n_tickers = len(tickers)
msg = 'Correct ' if n_tickers == matches else 'Incorrect '
msg += 'number of tickers ({}) found'
print(msg.format(n_tickers))
for index in range(n_tickers):
print("Row " + tickers[index].text + " ")
推荐阅读
- docker - nginx 反向代理 proxy_pass 通配符
- oracle - 使用另一列设置 DATE 列默认值 (VARCHAR2(8)),错误 ora-00904
- visual-studio-code - 在 VSCode 中创建较小的编辑器窗口的水平列表
- node.js - node.js 数组 Json
- sql-server - 使用 Docker 容器中的 Windows 凭据连接到 SQL Server
- javascript - 当您无法直接访问服务器时如何监听服务器网页的更改(以修改它的工作)?
- java - 如何对大十进制执行加法?
- c++ - CUDA C++链接错误未定义参考threadIdx.x和blockDim.x
- c# - 如何使用组合框/下拉列表显示 DataGrid 属性,该组合框/下拉列表根据另一个文本框属性的输入而变化
- php - 使用 PHP Ajax 加载相同的行从数据库中加载更多搜索的行