首页 > 解决方案 > 如何使用python获取html页面中的标题和url

问题描述

我想去department并且只想选择/打印nameand url。我尝试了以下方法,但我无法理解如何进入department并选择这两个特定的东西。如何获取所有链接的“名称”和“网址”?

import json
import urllib.request
from bs4 import BeautifulSoup


def getContent():
    # target site url
    url = "www.xyz.com"
    # requesting the url for data
    request = urllib.request.Request(url)
    # get the html, whole page
    htmlpage = urllib.request.urlopen(request).read()
    bsoup = BeautifulSoup(htmlpage, "html.parser")
    # print(bsoup.prettify())

    # main_table = bsoup.find("div",attrs)
    # print(main_table)
    # print(bsoup.find_all('name'))
    # nav = bsoup.nav
    # print(bsoup.title.department.url)
    # for url in find_all('a'):
    # print(url.get('href'))

    for link in bsoup.find_all("a"):
        print("Title: {}".format(link.get("name")))
        print("href: {}".format(link.get("href")))

标签: pythonweb-scrapingbeautifulsoupurllib

解决方案


您可以使用以下模块获取name/ :urljson

import json
import urllib.request
from bs4 import BeautifulSoup


def get_content():
    url = "http://www.ucdenver.edu/pages/ucdwelcomepage.aspx"
    request = urllib.request.Request(url)
    html_page = urllib.request.urlopen(request).read()
    soup = BeautifulSoup(html_page, 'html.parser')

    json_data = json.loads(soup.find("script", type="application/ld+json").string)
    for data in json_data["department"]:
        print("{:<60} {}".format(data["name"], data["url"]))

get_content()

输出:

Center for Undergraduate Exploration and Advising            https://www.ucdenver.edu/center-for-undergraduate-exploration-and-advising
Commencement                                                 https://www.ucdenver.edu/commencement
Counseling Center                                            https://www.ucdenver.edu/counseling-center
First Year Experiences                                       https://www.ucdenver.edu/first-year-experiences
Health Programs                                              https://www.ucdenver.edu/programs/health-programs
Housing and Dining                                           https://www.ucdenver.edu/housing-and-dining
...

推荐阅读