python-3.x - 为什么我们在这个基于 ajax 的音乐网站中使用 Selenium 得到空白文本?
问题描述
嗨~我是一个疯狂的python初学者,我最近想用selenium(find_elements_by_selector)从我自己最喜欢的列表中爬取歌手名和歌曲名
网址:https ://www.xiami.com/favorite/88955424
但是我试了,失败了~返回选择列表是空的,不知道为什么
音乐网站以ajax为基础
下面是控制台中空选择的样子,我很难过
[]
[]
[]
situation(song amount)(singer amount)(album amount): 0 0 0
这是我的原始脚本
from selenium import webdriver
import mysql.connector
import time
class xiami():
def __init__(self):
self.url='https://www.xiami.com/favorite/88955424'
def turn_on_url(self):
self.browser = webdriver.Chrome()
self.browser.get(self.url)
self.browser.maximize_window()
self.browser.implicitly_wait(8)
def get_page_data(self):#get infos of singers and songs and albums
self.song_names=self.browser.find_elements_by_css_selector('div[class="song-name em"] a[data-spm-anchor-id="a2oj1.12028340.0.0"]')#song name
self.singers=self.browser.find_elements_by_css_selector('div[class="singers"] a[data-spm-anchor-id="a2oj1.12028340.0.0"]')
self.albums=self.browser.find_elements_by_css_selector('div[class="album"] a[data-spm-anchor-id="a2oj1.12028340.0.0"]')
print(self.song_names)
print(self.singers)
print(self.albums)
print('situation(song amount)(singer amount)(album amount):',len(self.song_names),len(self.singers),len(self.albums))
if __name__=='__main__':
xiami=xiami()
xiami.turn_on_url()
xiami.get_page_data()
解决方案
要通过Selenium从我自己喜欢的列表中抓取歌手姓名以及歌曲名称和专辑,您可以使用以下解决方案:
代码块:
from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC chrome_options = webdriver.ChromeOptions() chrome_options.add_argument("start-maximized") driver = webdriver.Chrome(options=chrome_options) driver.get("https://www.xiami.com/favorite/88955424") song_names = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table//tbody//tr[@class='odd' or @class='even']//div[contains(@class, 'song-name')]/a")))] singers = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table//tbody//tr[@class='odd' or @class='even']//div[@class='singers']/a")))] albums = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table//tbody//tr[@class='odd' or @class='even']//div[@class='album']/a")))] for a,b,c in zip(song_names, singers, albums): print("Song {} is by {} from {} album.".format(a, b, c))
控制台输出:
Song Reckless is by Arin Ray from Platinum Fire (Deluxe) album. Song Grey Area is by Jerry Paper from Like a Baby album. Song Open Up the Door is by Weyes Blood from Truelove's Gutter album. Song Looking For Your Love is by Richard Hawley from Looking For Your Love album. Song Blue Lips is by HUM?NIGHTM?RE from Invitation to Her's album. Song Nicolo Paganini: Introduction and Variations on Nel cor piu non mi sento from Paisiello's La molinar is by Her's from Paganini: In cor più non mi sento; 3 Duetti; Divertimenti carnevaleschi album. Song Layin Low is by Niccolò Paganini from MFSB album. Song Don't Start Givin' Up is by Stefan Milenkovic from Flashes Of Life album.
推荐阅读
- git - 上游 git 标签未显示在分叉的存储库中
- mongodb - Ubuntu 18.04 中的 MongoDB Zip 安装失败
- python - 如何使用 Flask 应用程序和 BaleBot 向用户发送消息
- java - 了解正则表达式以从字符串中删除 HTML 标记
- python - 我应该如何使以下打印语句为一行的每个部分占用一致的空格数?
- php - 定义一个类来调用它的函数,但它无法识别并显示错误
- grpc - 使用 gevent 和 grpc 缩放问题
- javascript - 如何使用 JavaScript RegExp 捕获特定组?
- javascript - Redux createStore 源代码 - 增强器递归回调部分的无限循环风险?
- javascript - (Vanilla JS)使用单击事件侦听器捕获数组元素的索引?