首页 > 解决方案 > 如何使用beautifulsoup仅提取“https”包含的链接?

问题描述

import requests
from bs4 import BeautifulSoup
page = requests.get("https://evaly.com.bd/")


soup = BeautifulSoup(page.content, 'html.parser')

for link in soup.find_all('a', href=True):
    print (link['href'])

代码结果:

代码结果

只需要 https 包含的链接而不是图像中标记的矩形框。

标签: python-3.xweb-scrapingbeautifulsoup

解决方案


您可以将.select方法与 CSS 选择器一起使用:

import requests
from bs4 import BeautifulSoup


page = requests.get("https://evaly.com.bd/")
soup = BeautifulSoup(page.content, 'html.parser')

for link in soup.select('a[href^="https://"]'):
    print (link['href'])

印刷:

https://merchant.evaly.com.bd/
https://www.facebook.com/groups/EvalyHelpDesk/
https://play.google.com/store/apps/details?id=bd.com.evaly.ebazar
https://evaly.com.bd/
https://evaly.com.bd/hot-deal
https://evaly.com.bd/premium-deal
https://evaly.com.bd/hot-deal
https://evaly.com.bd/premium-deal
https://evaly.com.bd/hot-deal
https://evaly.com.bd/campaign/shop/samsung-note-20-for-hot-deal/samsung-note20-for-hot-deal-058bbc
https://evaly.com.bd/premium-deal
https://evaly.com.bd/campaign/shop/rancon-motors-for-mega-deal-pod/rancon-motors-for-mega-deal-pod-be211b
https://evaly.com.bd/premium-deal
https://play.google.com/store/apps/details?id=bd.com.evaly.ebazar
https://evaly.com.bd/
https://play.google.com/store/apps/details?id=bd.com.evaly.evalyshop
https://apps.apple.com/app/id1504042677
https://www.facebook.com/evaly.com.bd/
https://www.instagram.com/evaly.com.bd/
https://www.youtube.com/channel/UCYxO44JS4_6CLXFKVmZJ7Vg

推荐阅读