首页 > 解决方案 > BeautifulSoup 如何在具有类的 div 中查找所有 href 链接

问题描述

在 disboard.org/ 上,我正在尝试收集具有“服务器名称”类的 div 中的所有 href。
源代码:

def scrape():
    url = 'https://disboard.org/search?keyword=hacking'
    response = requests.get(url).content
    soup = BeautifulSoup(response, 'html.parser')
    areas = soup.find_all('div', class_='server-name')
    for area in areas:
        print(area.get('href'))

调用此函数时给出的错误消息是“无”而不是链接。例子:

None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None

标签: pythonhtmlbeautifulsoup

解决方案


替换为:

area.find('a').attrs['href']

完整代码

import requests
from bs4 import BeautifulSoup

def scrape():
    url = 'https://disboard.org/search?keyword=hacking'
    response = requests.get(url).content
    soup = BeautifulSoup(response, 'html.parser')
    areas = soup.find_all('div', class_='server-name')
    for area in areas:
        print(area.find('a').attrs['href'])


if __name__ == '__main__':
    scrape()

输出

/server/484696439063314482
/server/560847285874065408
/server/715563459739385886
/server/720783958966796309
/server/471545766134153237
/server/733350720690061383
/server/653642434948890626
/server/589905664277610521
/server/729633522565775381
/server/734257173890334832
/server/637702746954530865
/server/326839256758616068
/server/495986950478757891

推荐阅读