python-3.x - 用于 Div Python 中 Data-Url 的 Web Scraper (BeautifulSoup)
问题描述
我不知道为什么程序不从 div 内部提取链接
我不知道错误是在定义 div 类还是在从 div 中提取 data-url 阶段的代码
这是当前代码:
import requests
from bs4 import BeautifulSoup
header = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
}
url = requests.get("https://www.chosic.com/free-music/all/" , headers=header)
soup = BeautifulSoup(url.content, 'lxml')
list = []
music = soup.find_all('div',{'class':'track-audio'})
for i in music:
i.findAll(['data-url'])
print(i)
输出 :
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="27306" data-url="https://www.chosic.com/wp-content/uploads/2021/02/happy-clappy-ukulele.mp3" id="waveform27306"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="25944" data-url="https://www.chosic.com/wp-content/uploads/2020/07/Art-Of-Silence_V2.mp3" id="waveform25944"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="26757" data-url="https://www.chosic.com/wp-content/uploads/2020/11/batchbug-sweet-dreams.mp3" id="waveform26757"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="27880" data-url="https://www.chosic.com/wp-content/uploads/2021/04/Luke-Bergs-Bliss.mp3" id="waveform27880"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="27281" data-url="https://www.chosic.com/wp-content/uploads/2021/02/Warm-Memories-Emotional-Inspiring-Piano.mp3" id="waveform27281"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="26021" data-url="https://www.chosic.com/wp-content/uploads/2020/08/fm-freemusic-give-me-a-smile.mp3" id="waveform26021"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="27247" data-url="https://www.chosic.com/wp-content/uploads/2021/02/Monkeys-Spinning-Monkeys.mp3" id="waveform27247"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="27248" data-url="https://www.chosic.com/wp-content/uploads/2021/02/Fluffing-a-Duck.mp3" id="waveform27248"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="27120" data-url="https://www.chosic.com/wp-content/uploads/2021/01/fm-freemusic-inspiring-optimistic-upbeat-energetic-guitar-rhythm.mp3" id="waveform27120"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="25860" data-url="https://www.chosic.com/wp-content/uploads/2020/07/alexander-nakarada-superepic.mp3" id="waveform25860"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="28703" data-url="https://www.chosic.com/wp-content/uploads/2021/08/An-Epic-Story.mp3" id="waveform28703"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="28923" data-url="https://www.chosic.com/wp-content/uploads/2021/08/scott-buckley-jul.mp3" id="waveform28923"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="24515" data-url="https://www.chosic.com/wp-content/uploads/2020/06/John_Bartmann_-_02_-_Happy_African_Village.mp3" id="waveform24515"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="27012" data-url="https://www.chosic.com/wp-content/uploads/2021/01/春
のテーマ-Spring-field-.mp3" id="waveform27012"></div></div>
<div class="track-audio"><div class="waveform before" data-saved="yes" data-track="25897" data-url="https://www.chosic.com/wp-content/uploads/2020/07/Brandenburg-Concerto-no.-3-BWV-1048-Complete-Performance.mp3" id="waveform25897"></div></div>
但我想从中提取data-url
它们div
例子 :
https://www.chosic.com/wp-content/uploads/2021/04/Luke-Bergs-Bliss.mp3
https://www.chosic.com/wp-content/uploads/2021/02/Warm-Memories-Emotional-Inspiring-Piano.mp3
https://www.chosic.com/wp-content/uploads/2020/08/fm-freemusic-give-me-a-smile.mp3
https://www.chosic.com/wp-content/uploads/2021/02/Monkeys-Spinning-Monkeys.mp3
https://www.chosic.com/wp-content/uploads/2021/02/Fluffing-a-Duck.mp3
https://www.chosic.com/wp-content/uploads/2021/01/fm-freemusic-inspiring-optimistic-upbeat-energetic-guitar-rhythm.mp3
https://www.chosic.com/wp-content/uploads/2020/07/alexander-nakarada-superepic.mp3
https://www.chosic.com/wp-content/uploads/2021/08/An-Epic-Story.mp3
https://www.chosic.com/wp-content/uploads/2021/08/scott-buckley-jul.mp3
https://www.chosic.com/wp-content/uploads/2020/06/John_Bartmann_-_02_-_Happy_African_Village.mp3
https://www.chosic.com/wp-content/uploads/2021/01/春
のテーマ-Spring-field-.mp3
https://www.chosic.com/wp-content/uploads/2020/07/Brandenburg-Concerto-no.-3-BWV-1048-Complete-Performance.mp3
任何可能的解决方案?
解决方案
.findAll
不接受 CSS 选择器。此外,您没有将输出分配.findAll
给任何东西。尝试:
import requests
from bs4 import BeautifulSoup
header = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
}
url = requests.get("https://www.chosic.com/free-music/all/", headers=header)
soup = BeautifulSoup(url.content, "lxml")
music = soup.find_all("div", {"class": "track-audio"})
for i in music:
m = i.select_one("[data-url]")
print(m["data-url"])
印刷:
https://www.chosic.com/wp-content/uploads/2021/02/happy-clappy-ukulele.mp3
https://www.chosic.com/wp-content/uploads/2020/07/Art-Of-Silence_V2.mp3
https://www.chosic.com/wp-content/uploads/2020/11/batchbug-sweet-dreams.mp3
https://www.chosic.com/wp-content/uploads/2021/04/Luke-Bergs-Bliss.mp3
https://www.chosic.com/wp-content/uploads/2021/02/Warm-Memories-Emotional-Inspiring-Piano.mp3
https://www.chosic.com/wp-content/uploads/2020/08/fm-freemusic-give-me-a-smile.mp3
https://www.chosic.com/wp-content/uploads/2021/02/Monkeys-Spinning-Monkeys.mp3
https://www.chosic.com/wp-content/uploads/2021/02/Fluffing-a-Duck.mp3
https://www.chosic.com/wp-content/uploads/2021/01/fm-freemusic-inspiring-optimistic-upbeat-energetic-guitar-rhythm.mp3
https://www.chosic.com/wp-content/uploads/2020/07/alexander-nakarada-superepic.mp3
https://www.chosic.com/wp-content/uploads/2021/08/An-Epic-Story.mp3
https://www.chosic.com/wp-content/uploads/2021/08/scott-buckley-jul.mp3
https://www.chosic.com/wp-content/uploads/2020/06/John_Bartmann_-_02_-_Happy_African_Village.mp3
https://www.chosic.com/wp-content/uploads/2021/01/春のテーマ-Spring-field-.mp3
https://www.chosic.com/wp-content/uploads/2020/07/Brandenburg-Concerto-no.-3-BWV-1048-Complete-Performance.mp3
推荐阅读
- node.js - 如何使用 express、jest 和 supertest 修复返回 404 状态代码而不是 200 的端点测试
- javascript - 节点 JS/localhost 服务器未显示图像
- java - 回收站视图过滤器在 android 中没有给出想要的结果
- java - 运行 Jar 的工作目录文件夹
- sql - 在 T-SQL 中获取不同的逗号分隔字符串
- angular - 从 Angular TestBed 中访问 DOM 正文元素
- python - Python + Crontab 的反应很奇怪
- python - Python Telegram bot - 第二个 InLineKeyboard 不起作用
- python-2.7 - 将变量写为函数的输入/输出的更短的方法
- spring-boot - 如何使用 SolrCrudRepository 实现布尔搜索