首页 > 解决方案 > 如何从 URL 中提取某些内容?

问题描述

我是一名学生,正在做作业。我被要求使用 BeautifulSoup 库来分析页面(https://www.edb.gov.hk/en/about-edb/press/press-releases/index.html)并提取表格或列表;然后将数据存储在 python 列表或 dict 或 pandas 数据框中。(这是要求)。

我使用带有标签“a”和“a href”的“for loop”成功提取了链接和标题名称。但是,我不知道如何从网上提取“日期”。

有人可以通过使用“div:nth-​​of-type”或其他方法给我一些建议吗?

标签: pythonbeautifulsoup

解决方案


要获取数据框的日期、标题和链接,您可以使用下一个示例:

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = "https://www.edb.gov.hk/en/about-edb/press/press-releases/index.html"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

all_data = []
for row in soup.select(".circulars_result_row:has(.table_text_mobile)"):
    data = [d.get_text(strip=True) for d in row.select(".table_text_mobile")]
    all_data.append(data + [row.a["href"]])

df = pd.DataFrame(all_data, columns=["Date", "Title", "Link"])
print(df)

印刷:

           Date                                                                                                                Title                                                                         Link
0   11 Oct 2021  EDB provides latest guidelines on display of national flag and conduct of national flag raising ceremony in schools  https://www.info.gov.hk/gia/general/202110/11/P2021101100304.htm?fontSize=1
1   11 Oct 2021                                                   Hong Kong Scholarship for Excellence Scheme opens for applications  https://www.info.gov.hk/gia/general/202110/11/P2021100800399.htm?fontSize=1
2   11 Oct 2021                                                                              Profiles of kindergartens posted online             https://www.info.gov.hk/gia/general/202110/11/P2021101100297.htm
3   05 Oct 2021                                           "Active Students, Active People" Campaign cum e-Gallery launching ceremony  https://www.info.gov.hk/gia/general/202110/05/P2021100400564.htm?fontSize=1
4   30 Sep 2021                                                                             EDB launches "SENSE" information website  https://www.info.gov.hk/gia/general/202109/30/P2021092900520.htm?fontSize=1
5   23 Sep 2021                                          Launch of School Nominations Direct Admission Scheme for local universities  https://www.info.gov.hk/gia/general/202109/23/P2021092300346.htm?fontSize=1
6   21 Sep 2021                                 Study Subsidy Scheme for Designated Professions/Sectors for 2022/23 cohort announced             https://www.info.gov.hk/gia/general/202109/21/P2021092000818.htm
7   20 Sep 2021                                                  EDB announces arrangements for student grant of 2021/22 school year             https://www.info.gov.hk/gia/general/202109/20/P2021092000578.htm
8   16 Sep 2021                                             Parents reminded to submit application form for admission to Primary One  https://www.info.gov.hk/gia/general/202109/16/P2021091600299.htm?fontSize=1
9   13 Sep 2021                      Junior Secondary History e-Reading Award Scheme fosters students' positive values and attitudes  https://www.info.gov.hk/gia/general/202109/13/P2021091300317.htm?fontSize=1
10  03 Sep 2021                                                                                         SED on Primary One admission  https://www.info.gov.hk/gia/general/202109/03/P2021090300473.htm?fontSize=1
11  02 Sep 2021                               EDB introduces newly developed Curriculum Framework on Parent Education (Kindergarten)  https://www.info.gov.hk/gia/general/202109/02/P2021090200238.htm?fontSize=1
12  01 Sep 2021                                                                       Appointments to Curriculum Development Council             https://www.info.gov.hk/gia/general/202109/01/P2021090100172.htm
13  01 Sep 2021                                                                                       SED speaks on first school day             https://www.info.gov.hk/gia/general/202109/01/P2021090100352.htm

推荐阅读