首页 > 解决方案 > 问题 - 从 beautifulsoup 内容中获取所有 href

问题描述

我想从下面的代码中获取所有 href 链接,但只获取第一个 href。无法解决我错的地方。你能帮我解决这个问题吗?

for i in range(1,3): 
    url = "https://www.gittigidiyor.com/samsung-cep-telefonu?sf=" + str(i)
    r = requests.get(url) 
    source = BeautifulSoup(r.content,"lxml")
    liste = source.find_all('div', attrs={"class":"gg-w-24 gg-d-24 gg-t-24 gg-m-24 root-column padding-none"}) 
    for url in liste:
        url_phone = "https:" + url.a.get("href")

        print(url_phone)

标签: pythonweb-scrapingbeautifulsoup

解决方案


你需要find_all('a')遍历这些,而不是只使用它,find('a')或者.a它只会抓住<a>它找到的第一个标签。

import requests
from bs4 import BeautifulSoup
import pandas as pd

for i in range(1,3): 
    url = "https://www.gittigidiyor.com/samsung-cep-telefonu?sf=" + str(i)
    r = requests.get(url) 
    source = BeautifulSoup(r.content,"lxml")
    liste = source.find_all('div', attrs={"class":"gg-w-24 gg-d-24 gg-t-24 gg-m-24 root-column padding-none"}) 
    for url in liste:
        all_hrefs = url.find_all('a', href=True)
        for href in all_hrefs:
            url_phone = "https:" + href['href']
            print(url_phone)

推荐阅读