首页 > 解决方案 > 使用 Selenium 下载没有 URL 的文件

问题描述

我正在尝试从网页下载音频文件。我只能听网站上的文件。我尝试复制文件的 URL,但结果没有 URL。该文件嵌入在 h5 标签中,如下所示:

<h5 class="mp3-1 pum-trigger" style="cursor: pointer;" audio="audio-8649-20_html5">
Hangup to Qualified Lead - Sample 1
</h5>

我可以检索音频文件吗?如果是,我该如何找回它?我已将链接添加到以下站点: https ://reivault.com/salesninjateam/

谢谢!!!

标签: seleniumbeautifulsouppython-requests

解决方案


您可以从 id 作为音频属性的音频标签中获取音频的 src,您可以使用 urllib 下载 src:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import urllib.request

driver = webdriver.Firefox()
driver.get("https://reivault.com/salesninjateam/)")


elem = driver.find_element_by_xpath("//h5[contains(text(),'Hangup to Qualified Lead - Sample 1')]")

id = elem.get_attribute("audio")

audioElem = driver.find_element_by_id(
   id)

src = audioElem.get_attribute("src")

print(src)


urllib.request.urlretrieve(src, "a.mp3")

driver.close()

您可以将所有文件下载为:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import urllib.request

driver = webdriver.Firefox()
driver.get("https://reivault.com/salesninjateam")

wait = WebDriverWait(driver,5)
elems = wait.until(EC.presence_of_all_elements_located((
    By.XPATH, "//h5[contains(@class,'pum-trigger')]")))


print(elems)
for elem in elems:
    id = elem.get_attribute("audio")
    if id.find("audio") != -1:
        print(id)
        audioElem = wait.until(EC.presence_of_element_located((
        By.ID, id)))
        src = audioElem.get_attribute("src")
        print(src)
        urllib.request.urlretrieve(src, "a.mp3")
        driver.close()
    else:
        pass

推荐阅读