首页 > 解决方案 > Selenium 滚动困境

问题描述

这是我能找到的唯一可以向下滚动到页面末尾的代码,没有其他任何工作。问题是 While True 语句永远不会完成,它会继续尝试向下滚动,即使在它到达底部之后也不会进入下一步的打印。如何结束 While True 语句并打印结果?谢谢

 from selenium import webdriver

    url = 'http://www.tradingview.com/screener'
    driver = webdriver.Firefox()
    driver.get(url)

    # Get scroll height
    last_height = driver.execute_script("return document.body.scrollHeight")

    while True:
        # Scroll down to bottom
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # will give a list of all tickers
    tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol') 

    for index in range(len(tickers)):
       print("Row " + tickers[index].text + " ") 

Errors I'm receiving


>>> from selenium import webdriver
>>> url = 'http://www.tradingview.com/screener'
>>> driver = webdriver.Firefox()
>>> driver.get(url)
>>>
>>> # Get scroll height
... last_height = driver.execute_script("return document.body.scrollHeight")
>>>
>>> selector = '.js-field-total.tv-screener-table__field-value--total'
>>> matches = driver.find_element_by_css_selector(selector)
>>> matches = int(matches.text.split()[0])
>>>
>>> visible_rows = 0
>>> scrolls = 0
>>>
>>> while visible_rows < matches:
...
  File "<stdin>", line 2

    ^
IndentationError: expected an indented block
>>>     driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
  File "<stdin>", line 1
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    ^
IndentationError: unexpected indent
>>>
>>>     # Wait 10 scrolls before updating row information
...     if scrolls == 10:
  File "<stdin>", line 2
    if scrolls == 10:
    ^
IndentationError: unexpected indent
>>>         table = driver.find_elements_by_class_name('tv-data-table__tbody')
  File "<stdin>", line 1
    table = driver.find_elements_by_class_name('tv-data-table__tbody')
    ^
IndentationError: unexpected indent
>>>         visible_rows = len(table[1].find_elements_by_tag_name('tr'))
  File "<stdin>", line 1
    visible_rows = len(table[1].find_elements_by_tag_name('tr'))
    ^
IndentationError: unexpected indent
>>>         scrolls = 0
  File "<stdin>", line 1
    scrolls = 0
    ^
IndentationError: unexpected indent
>>>
>>>     scrolls += 1
  File "<stdin>", line 1
    scrolls += 1
    ^
IndentationError: unexpected indent
>>>
>>> # will give a list of all tickers
... tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol')
>>>
>>> for index in range(len(tickers)):
...    print("Row " + tickers[index].text + " ")
...

标签: pythonselenium-webdriver

解决方案


在代码下方,它告诉您表格中有多少行(匹配项)。因此,一种选择是将可见行数与总行数进行比较。当您达到该数量(可见行数)时,您退出循环。

url = 'http://www.tradingview.com/screener'
driver = webdriver.Firefox()
driver.get(url)

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

selector = '.js-field-total.tv-screener-table__field-value--total'
matches = driver.find_element_by_css_selector(selector)
matches = int(matches.text.split()[0])

visible_rows = 0
scrolls = 0

while visible_rows < matches:

    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait 10 scrolls before updating row information 
    if scrolls == 10:
        table = driver.find_elements_by_class_name('tv-data-table__tbody')
        visible_rows = len(table[1].find_elements_by_tag_name('tr'))
        scrolls = 0

    scrolls += 1

# will give a list of all tickers
tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol') 

for index in range(len(tickers)):
   print("Row " + tickers[index].text + " ") 

Edit: Since your setup doesn't seem to allow the previous solution, here's a different approach you can try. The page loads 150 rows at a time. So, instead of counting the number of visible rows, we can use the total matches/rows we're expecting (e.g. 4894) and divide that by 150 to get the number of times we need to scroll. If we scroll at least that many times, in theory, all of the rows should be visible and we can continue with the code.

from time import sleep
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

url = 'http://www.tradingview.com/screener'
driver = webdriver.Chrome('./chromedriver')
driver.get(url)

try:

    selector = '.js-field-total.tv-screener-table__field-value--total'
    condition = EC.visibility_of_element_located((By.CSS_SELECTOR, selector))
    matches = WebDriverWait(driver, 10).until(condition)
    matches = int(matches.text.split()[0])

except (TimeoutException, Exception):
    print ('Problem finding matches, setting default...')
    matches = 4895 # Set default

# The page loads 150 rows at a time; divide matches by
# 150 to determine the number of times we need to scroll;
# add 5 extra scrolls just to be sure
num_loops = int(matches / 150 + 5)

for _ in range(num_loops):

    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    sleep(2) # Pause briefly to allow loading time

# will give a list of all tickers
tickers = driver.find_elements_by_css_selector('a.tv-screener__symbol') 

n_tickers = len(tickers)

msg = 'Correct ' if n_tickers == matches else 'Incorrect '
msg += 'number of tickers ({}) found'
print(msg.format(n_tickers))

for index in range(n_tickers):
    print("Row " + tickers[index].text + " ")

推荐阅读