首页 > 解决方案 > 试图通过 selenium 抓取但不能遍历动态内容

问题描述

我正在尝试抓取这个网站。我得到了第一组数据,但是当我尝试使用 for 循环进行迭代时,它返回一个错误。我已经尝试过多次更改 user_data 类名,但它仍然拒绝迭代。

import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)

driver.get('https://soundcloud.com/jujubucks')
print(driver.title)

user_data = driver.find_elements_by_class_name('userStreamItem')

for user in user_data:
    search = driver.find_element_by_xpath('.//span[@class="soundTitle__usernameText"]')
    search_song = driver.find_element_by_xpath('.//span[@class=""]')
    search_date = driver.find_element_by_xpath('.//span[@class="sc-visuallyhidden"]')
    stats = driver.find_element_by_css_selector('.soundStats.sc-ministats-group .sc- 
    visuallyhidden')

    print(f'''

    Artist: {search.text}
    Song Title: {search_song.text}
    Upload Date: {search_date.text}
    Track Plays: {stats.text}

    ''')

    driver.quit()

标签: pythonhtmlseleniumweb-scrapingiterator

解决方案


这里有几个问题。
要从页面上最初显示的所有歌曲中选择数据,您可以执行以下操作:

import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains


PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
wait = WebDriverWait(driver, 20)
actions = ActionChains(driver)

driver.get('https://soundcloud.com/jujubucks')
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click()

wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".userStreamItem")))
time.sleep(1)
songs = driver.find_elements_by_class_name('userStreamItem')
for i in range(1, len(songs)+1):
    song = driver.find_element_by_xpath(f"(//div[@class='userStreamItem'])[{i}]")
    actions.move_to_element(song).perform()
    time.sleep(0.5)
    search_song = song.find_element_by_xpath('.//span[@class=""]')
    search_date = song.find_element_by_xpath('.//span[@class="sc-visuallyhidden"]')
    stats = song.find_element_by_xpath(".//ul[@class='soundStats sc-ministats-group']//span[@class='sc-visuallyhidden']")
    print(f'''

    Artist: {search.text}
    Song Title: {search_song.text}
    Upload Date: {search_date.text}
    Track Plays: {stats.text}

    ''')

但这只会获取 5 首歌曲数据。要获得更多信息,您必须滚动页面


推荐阅读