首页 > 解决方案 > 如何解析 HTML 页面中的链接?

问题描述

我想解析这个网站的链接列表

我正在尝试使用 Python 中的请求库来做到这一点。但是,当我尝试使用 bs4 阅读 HTML 时,没有任何链接。只是空ul

< ul class="ais-Hits-list">< /ul >

我怎样才能得到这些链接?

编辑:到目前为止我尝试过的代码:

link = "https://www.over-view.com/digital-index/"
r = requests.get(link)
soup = BeautifulSoup(r.content, 'lxml')

标签: pythonhtmlbeautifulsoup

解决方案


由于信息在该网站上动态加载,您可以使用它selenium来收集所需的信息:

import time

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--window-size=1920x1080")

path_to_chromedriver ='chromedriver'
driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=path_to_chromedriver)

driver.get('https://www.over-view.com/digital-index/')

time.sleep(5)

soup = BeautifulSoup(driver.page_source, "lxml")
rows = soup.select("ul.ais-Hits-list > li > a")

for row in rows:
    print(row.get('href'))

输出示例:

/overviews/adelaide-canola-flowers
/overviews/adelaide-rift-complex
/overviews/adriatic-tankers
/overviews/adventuredome
/overviews/agricultural-development
/overviews/agricultural-development
/overviews/agricultural-development
/overviews/agriculture-development

推荐阅读