首页 > 解决方案 > VBA Web Scraping- 抓取“hrefs”列表

问题描述

我想抓取包含在可点击链接中的名称列表。但是我没有得到结果。在第二步中,我想为每个标题创建一个新标签。

如果有人能给我提示我的编码有什么问题以及如何优化它,那就太好了。

提前感谢您的帮助!

在此处输入图像描述

Option Explicit

Sub Teams()

Dim IE As SHDocVw.InternetExplorer
    Dim HTMLdoc As MSHTML.HTMLDocument
    Dim li_all As MSHTML.IHTMLElementCollection
    Dim li_single As Object
    Dim i As Long
    
    Set IE = New SHDocVw.InternetExplorer
    IE.Visible = False
    IE.Navigate "https://www.examplexyz.de/"
    
    Do While IE.ReadyState <> READYSTATE_COMPLETE
    Loop
    Application.Wait (Now + TimeValue("0:00:07"))
    
    Set HTMLdoc = IE.Document
    Set li_all = HTMLdoc.getElementsByClassName("icon_holder")
    
    For i = 0 To li_all.Length - 1
    
        li_single = li_all(0).getElementsByTagName("li").Item(i).innerText
        Debug.Print li_single
          
    Next
    
    IE.Quit

End Sub

'Helper function to get a child (of `obj`) element's text using its className
'  (only handles a single instance but could be extended)
Function classText(obj As Object, classname As String) As String
    Dim els As Object
    Set els = obj.getElementsByClassName(classname)
    If els.Length > 0 Then
        classText = els(0).innerText
    Else
        classText = "[not found]"
    End If
End Function

标签: vbaweb-scrapinggetelementsbyclassname

解决方案


我找到了一个解决方案,如何使用 queryselectorall 读取所有链接文本。但是目前我多次获得链接文本,因为链接文本在较低级别中多次存在。

我如何设法只读出“li”的第一级?

Sub Neu()

Dim objIE As InternetExplorer, nodeList As Object, OutputString As String, currentItem As Long
Set objIE = New InternetExplorer

objIE.Visible = False
objIE.Navigate "https://www.examplexyz.de/"
    
Do While objIE.ReadyState <> READYSTATE_COMPLETE
Loop
Application.Wait (Now + TimeValue("0:00:05"))

Set nodeList = objIE.Document.querySelectorAll("div.icon_holder a")

For currentItem = 0 To nodeList.Length - 1
OutputString = nodeList.Item(currentItem)
Debug.Print currentItem & " " & OutputString

Next currentItem

End Sub

推荐阅读