python-3.x - 如何使用beautifulsoup仅提取“https”包含的链接?
问题描述
import requests
from bs4 import BeautifulSoup
page = requests.get("https://evaly.com.bd/")
soup = BeautifulSoup(page.content, 'html.parser')
for link in soup.find_all('a', href=True):
print (link['href'])
代码结果:
只需要 https 包含的链接而不是图像中标记的矩形框。
解决方案
您可以将.select
方法与 CSS 选择器一起使用:
import requests
from bs4 import BeautifulSoup
page = requests.get("https://evaly.com.bd/")
soup = BeautifulSoup(page.content, 'html.parser')
for link in soup.select('a[href^="https://"]'):
print (link['href'])
印刷:
https://merchant.evaly.com.bd/
https://www.facebook.com/groups/EvalyHelpDesk/
https://play.google.com/store/apps/details?id=bd.com.evaly.ebazar
https://evaly.com.bd/
https://evaly.com.bd/hot-deal
https://evaly.com.bd/premium-deal
https://evaly.com.bd/hot-deal
https://evaly.com.bd/premium-deal
https://evaly.com.bd/hot-deal
https://evaly.com.bd/campaign/shop/samsung-note-20-for-hot-deal/samsung-note20-for-hot-deal-058bbc
https://evaly.com.bd/premium-deal
https://evaly.com.bd/campaign/shop/rancon-motors-for-mega-deal-pod/rancon-motors-for-mega-deal-pod-be211b
https://evaly.com.bd/premium-deal
https://play.google.com/store/apps/details?id=bd.com.evaly.ebazar
https://evaly.com.bd/
https://play.google.com/store/apps/details?id=bd.com.evaly.evalyshop
https://apps.apple.com/app/id1504042677
https://www.facebook.com/evaly.com.bd/
https://www.instagram.com/evaly.com.bd/
https://www.youtube.com/channel/UCYxO44JS4_6CLXFKVmZJ7Vg
推荐阅读
- php - Laravel 6 中的多文件上传器:“必须是 Symfony\Component\HttpFoundation\File\UploadedFile 的实例,给定数组”
- linux - 读取 csv 并以 excel 格式写入
- c# - 在 Views 文件夹中创建一个视图
- java - 在 Gradle 中,[如何] 我们应该同时针对 Android 和 JRE?
- ios - 如果 bool 键为假,则消除整行
- javascript - 为什么断行 (\n) 在我的代码中不起作用?
- amazon-ec2 - 在 ec2 linux 实例上安装 mysql-server 时出现“没有可用的包 mysql-server”之类的错误
- html - 当里面有桌子时,如何防止弹性项目溢出其容器
- keras - 将张量轴转换为字符串并连接重塑张量的轴?
- sycl - 在 SYCL 中实现矩阵加法和乘法