首页 > 解决方案 > 为什么我们在这个基于 ajax 的音乐网站中使用 Selenium 得到空白文本?

问题描述

嗨~我是一个疯狂的python初学者,我最近想用selenium(find_elements_by_selector)从我自己最喜欢的列表中爬取歌手名和歌曲名

网址:https ://www.xiami.com/favorite/88955424

但是我试了,失败了~返回选择列表是空的,不知道为什么

音乐网站以ajax为基础

下面是控制台中空选择的样子,我很难过

[]
[]
[]
situation(song amount)(singer amount)(album amount): 0 0 0

这是我的原始脚本

from selenium import webdriver
import mysql.connector
import time

class xiami():
   def __init__(self):
       self.url='https://www.xiami.com/favorite/88955424'

   def turn_on_url(self):
       self.browser = webdriver.Chrome()
       self.browser.get(self.url)
       self.browser.maximize_window()
       self.browser.implicitly_wait(8)

   def get_page_data(self):#get infos of singers and songs and albums

       self.song_names=self.browser.find_elements_by_css_selector('div[class="song-name em"] a[data-spm-anchor-id="a2oj1.12028340.0.0"]')#song name
       self.singers=self.browser.find_elements_by_css_selector('div[class="singers"] a[data-spm-anchor-id="a2oj1.12028340.0.0"]')
       self.albums=self.browser.find_elements_by_css_selector('div[class="album"] a[data-spm-anchor-id="a2oj1.12028340.0.0"]')
       print(self.song_names)
       print(self.singers)
       print(self.albums)
       print('situation(song amount)(singer amount)(album amount):',len(self.song_names),len(self.singers),len(self.albums))

if __name__=='__main__':
   xiami=xiami()
   xiami.turn_on_url()
   xiami.get_page_data()


标签: python-3.xseleniumselenium-webdriverweb-crawlerwebdriverwait

解决方案


要通过Selenium从我自己喜欢的列表中抓取歌手姓名以及歌曲名称专辑,您可以使用以下解决方案:

  • 代码块:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    chrome_options = webdriver.ChromeOptions()
    chrome_options.add_argument("start-maximized") 
    driver = webdriver.Chrome(options=chrome_options)
    driver.get("https://www.xiami.com/favorite/88955424")
    song_names = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table//tbody//tr[@class='odd' or @class='even']//div[contains(@class, 'song-name')]/a")))]
    singers = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table//tbody//tr[@class='odd' or @class='even']//div[@class='singers']/a")))]
    albums = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table//tbody//tr[@class='odd' or @class='even']//div[@class='album']/a")))]
    for a,b,c in zip(song_names, singers, albums):
        print("Song {} is by {} from {} album.".format(a, b, c))
    
  • 控制台输出:

    Song Reckless is by Arin Ray from Platinum Fire (Deluxe) album.
    Song Grey Area is by Jerry Paper from Like a Baby album.
    Song Open Up the Door is by Weyes Blood from Truelove's Gutter album.
    Song Looking For Your Love is by Richard Hawley from Looking For Your Love album.
    Song Blue Lips is by HUM?NIGHTM?RE from Invitation to Her's album.
    Song Nicolo Paganini: Introduction and Variations on Nel cor piu non mi sento from Paisiello's La molinar is by Her's from Paganini: In cor più non mi sento; 3 Duetti; Divertimenti carnevaleschi album.
    Song Layin Low is by Niccolò Paganini from MFSB album.
    Song Don't Start Givin' Up is by Stefan Milenkovic from Flashes Of Life album.
    

推荐阅读