首页 > 解决方案 > BeautifulSoup 解析 soundcloud 追随者

问题描述

我正在尝试解析 soundcloud 页面并从帐户“关注者”页面中获取链接和用户名。

我已经尝试了以下但我没有得到任何我想要的链接

from selenium import webdriver
from selenium.webdriver.common.by import By 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC 
from selenium.common.exceptions import TimeoutException
from bs4 import BeautifulSoup

driver = webdriver.Firefox(executable_path="C:\\Users\\marco\\Downloads\\geckodriver- 
v0.23.0-win64\\geckodriver.exe")

driver.get("https:////soundcloud.com/marco-valencia/followers")
soup = BeautifulSoup(driver.page_source)

print(soup.find_all("a"))

我想找到类“userBadgeListItem__image”的所有“a”并提取href字符串和相应的链接。

标签: python-3.xweb-scrapingbeautifulsoup

解决方案


让 BeautifulSoup 找到a具有类的元素

soup.find_all("a", class_="userBadgeListItem__image")

但是现在让我们只使用 Selenium。该类userBadgeListItem__image没有锚文本将其更改为userBadgeListItem__heading

driver.get("https://..............")

# scroll down to get all followers
while True:
    driver.execute_script('window.scrollTo(0, document.body.scrollHeight)')
    time.sleep(3) # wait ajax request
    try:
        # loop until this loading element removed from the page
        driver.find_element_by_css_selector('div.loading.regular.m-padded')
    except: break

# finally extract the followers
followers = driver.find_elements_by_class_name('userBadgeListItem__heading')
for f in followers:
    print('%s: %s' % (f.text, f.get_attribute('href')))

推荐阅读