首页 > 解决方案 > Excel VBA - 从网站获取所有 href 链接

问题描述

例子

您好,希望有人可以帮助我。在此示例链接上:https ://www.academiadasapostas.com/stats/competition/brasil/26

我想获取所有作为“VS”目标的href链接。我正在尝试这样的例子:

Sub ScrapeScores()

Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
Dim HTMLTables As MSHTML.IHTMLElementCollection
Dim HTMLTable As MSHTML.IHTMLElement
Dim HTMLDiv As MSHTML.IHTMLElement
Dim TableSection As MSHTML.IHTMLElement
Dim TableRow As MSHTML.IHTMLElement
Dim TableCell As MSHTML.IHTMLElement
Dim RowText As String



IE.Visible = True
IE.navigate "https://www.academiadasapostas.com/stats/competition/brasil/26"

Do While IE.readyState <> READYSTATE_COMPLETE Or IE.Busy
Loop

Set HTMLDoc = IE.document
Set HTMLDiv = HTMLDoc.getElementById("competition-round-group-0")
Set HTMLTables = HTMLDiv.getElementsByTagName("a")

For Each HTMLTable In HTMLTables
    Debug.Print HTMLTable.ID, "&", HTMLTable.className
    
    For Each TableSection In HTMLTable.Children
        Debug.Print , TableSection.tagName
        
    Next TableSection
    
Next HTMLTable


End Sub

但没有成功。我想我可以将 CSS 与 SelectorAll 一起使用,对吧?由于 IE 将要被淘汰,所以改用 CSS 会很好。

提前感谢您的任何回答。

标签: excelvbaweb-scrapingexport-to-excel

解决方案


您可以将以下 css 模式与querySelectorAll .competition-rounds td:nth-child(4) > a一起使用。循环返回nodeListhref从每个节点中提取。这将选择该表中的第 4 列,然后选择子a标记,href在循环期间从中提取属性。


所需参考资料:

  1. 微软互联网控制
  2. Microsoft HTML 对象库

Option Explicit

Public Sub PrintLinks()
    Dim ie As SHDocVw.InternetExplorer, nodeList As MSHTML.IHTMLDOMChildrenCollection

    Set ie = New SHDocVw.InternetExplorer

    With ie

        .Visible = True
        .Navigate2 "https://www.academiadasapostas.com/stats/competition/brasil/26"
        
        While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
        
        Set nodeList = ie.Document.querySelectorAll(".competition-rounds td:nth-child(4) > a")
        
        Dim i As Long
        
        For i = 0 To nodeList.length - 1
          
            Debug.Print nodeList.Item(i).href

        Next
        
        Stop

        .Quit
    End With
End Sub

阅读:

  1. 第n个孩子()
  2. 子组合器

推荐阅读