首页 > 解决方案 > 用于 Div Python 中 Data-Url 的 Web Scraper (BeautifulSoup)

问题描述

我不知道为什么程序不从 div 内部提取链接

我不知道错误是在定义 div 类还是在从 div 中提取 data-url 阶段的代码

这是当前代码:

import requests 
from bs4 import BeautifulSoup

header = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
}

url = requests.get("https://www.chosic.com/free-music/all/" , headers=header)
soup = BeautifulSoup(url.content, 'lxml')

list = []

music = soup.find_all('div',{'class':'track-audio'})
for i in music:
    i.findAll(['data-url'])
    print(i)

输出 :

<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="27306" data-url="https://www.chosic.com/wp-content/uploads/2021/02/happy-clappy-ukulele.mp3" id="waveform27306"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="25944" data-url="https://www.chosic.com/wp-content/uploads/2020/07/Art-Of-Silence_V2.mp3" id="waveform25944"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="26757" data-url="https://www.chosic.com/wp-content/uploads/2020/11/batchbug-sweet-dreams.mp3" id="waveform26757"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="27880" data-url="https://www.chosic.com/wp-content/uploads/2021/04/Luke-Bergs-Bliss.mp3" id="waveform27880"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="27281" data-url="https://www.chosic.com/wp-content/uploads/2021/02/Warm-Memories-Emotional-Inspiring-Piano.mp3" id="waveform27281"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="26021" data-url="https://www.chosic.com/wp-content/uploads/2020/08/fm-freemusic-give-me-a-smile.mp3" id="waveform26021"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="27247" data-url="https://www.chosic.com/wp-content/uploads/2021/02/Monkeys-Spinning-Monkeys.mp3" id="waveform27247"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="27248" data-url="https://www.chosic.com/wp-content/uploads/2021/02/Fluffing-a-Duck.mp3" id="waveform27248"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="27120" data-url="https://www.chosic.com/wp-content/uploads/2021/01/fm-freemusic-inspiring-optimistic-upbeat-energetic-guitar-rhythm.mp3" id="waveform27120"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="25860" data-url="https://www.chosic.com/wp-content/uploads/2020/07/alexander-nakarada-superepic.mp3" id="waveform25860"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="28703" data-url="https://www.chosic.com/wp-content/uploads/2021/08/An-Epic-Story.mp3" id="waveform28703"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="28923" data-url="https://www.chosic.com/wp-content/uploads/2021/08/scott-buckley-jul.mp3" id="waveform28923"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="24515" data-url="https://www.chosic.com/wp-content/uploads/2020/06/John_Bartmann_-_02_-_Happy_African_Village.mp3" id="waveform24515"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="27012" data-url="https://www.chosic.com/wp-content/uploads/2021/01/春
のテーマ-Spring-field-.mp3" id="waveform27012"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="25897" data-url="https://www.chosic.com/wp-content/uploads/2020/07/Brandenburg-Concerto-no.-3-BWV-1048-Complete-Performance.mp3" id="waveform25897"></div></div>

但我想从中提取data-url它们div


例子 :


https://www.chosic.com/wp-content/uploads/2021/04/Luke-Bergs-Bliss.mp3
https://www.chosic.com/wp-content/uploads/2021/02/Warm-Memories-Emotional-Inspiring-Piano.mp3
https://www.chosic.com/wp-content/uploads/2020/08/fm-freemusic-give-me-a-smile.mp3
https://www.chosic.com/wp-content/uploads/2021/02/Monkeys-Spinning-Monkeys.mp3
https://www.chosic.com/wp-content/uploads/2021/02/Fluffing-a-Duck.mp3
https://www.chosic.com/wp-content/uploads/2021/01/fm-freemusic-inspiring-optimistic-upbeat-energetic-guitar-rhythm.mp3
https://www.chosic.com/wp-content/uploads/2020/07/alexander-nakarada-superepic.mp3
https://www.chosic.com/wp-content/uploads/2021/08/An-Epic-Story.mp3
https://www.chosic.com/wp-content/uploads/2021/08/scott-buckley-jul.mp3
https://www.chosic.com/wp-content/uploads/2020/06/John_Bartmann_-_02_-_Happy_African_Village.mp3
https://www.chosic.com/wp-content/uploads/2021/01/春
のテーマ-Spring-field-.mp3
https://www.chosic.com/wp-content/uploads/2020/07/Brandenburg-Concerto-no.-3-BWV-1048-Complete-Performance.mp3

任何可能的解决方案?

标签: python-3.xweb-scrapingbeautifulsouppython-requests

解决方案


.findAll不接受 CSS 选择器。此外,您没有将输出分配.findAll给任何东西。尝试:

import requests
from bs4 import BeautifulSoup

header = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
}

url = requests.get("https://www.chosic.com/free-music/all/", headers=header)
soup = BeautifulSoup(url.content, "lxml")

music = soup.find_all("div", {"class": "track-audio"})
for i in music:
    m = i.select_one("[data-url]")
    print(m["data-url"])

印刷:

https://www.chosic.com/wp-content/uploads/2021/02/happy-clappy-ukulele.mp3
https://www.chosic.com/wp-content/uploads/2020/07/Art-Of-Silence_V2.mp3
https://www.chosic.com/wp-content/uploads/2020/11/batchbug-sweet-dreams.mp3
https://www.chosic.com/wp-content/uploads/2021/04/Luke-Bergs-Bliss.mp3
https://www.chosic.com/wp-content/uploads/2021/02/Warm-Memories-Emotional-Inspiring-Piano.mp3
https://www.chosic.com/wp-content/uploads/2020/08/fm-freemusic-give-me-a-smile.mp3
https://www.chosic.com/wp-content/uploads/2021/02/Monkeys-Spinning-Monkeys.mp3
https://www.chosic.com/wp-content/uploads/2021/02/Fluffing-a-Duck.mp3
https://www.chosic.com/wp-content/uploads/2021/01/fm-freemusic-inspiring-optimistic-upbeat-energetic-guitar-rhythm.mp3
https://www.chosic.com/wp-content/uploads/2020/07/alexander-nakarada-superepic.mp3
https://www.chosic.com/wp-content/uploads/2021/08/An-Epic-Story.mp3
https://www.chosic.com/wp-content/uploads/2021/08/scott-buckley-jul.mp3
https://www.chosic.com/wp-content/uploads/2020/06/John_Bartmann_-_02_-_Happy_African_Village.mp3
https://www.chosic.com/wp-content/uploads/2021/01/春のテーマ-Spring-field-.mp3
https://www.chosic.com/wp-content/uploads/2020/07/Brandenburg-Concerto-no.-3-BWV-1048-Complete-Performance.mp3

推荐阅读