首页 > 解决方案 > Excel VBA webscrape,我如何获得跨度值?

问题描述

相对较新的 VBA 和新的网络抓取。我的任务是从网站获取一些数据。我试过在这里搜索帮助,根据我发现的内容尝试了很多排列,但没有得到我需要的结果。网页 DOM Explorer 的片段(使用“F12 开发者工具”)显示以下内容(经过编辑使其通用):

<div class=”nav nav-list”&gt;
<div>
<span class=”nav-list-item”&gt;Item:</span>
        <span>
            mySearchString and other text
        </span>
</div>
<div>…&lt;/div>
<div>
        <span class=”nav-list-item”&gt;Retail UPC:</span>
        <span>upcNumber</span>
</div>
<div>…&lt;/div>
</div>
</div>

我正在尝试搜索“mySearchString”,提取“和其他文本”并搜索“零售 UPC:”并提取“upcNumber”。

尝试使用嵌套的 if 语句,但无法正常工作。以下是我一直在玩的最新版本的片段:

Dim harborDesc() as String
Dim ieObj As InternetExplorer
Set ieObj = CreateObject("InternetExplorer.Application")    
Dim htmlEle As Object
Dim itemurl As String

Itemurl = “url of interest”
ieObj.navigate itemurl    'in this case, the web page is has the same name as the itemNum
Do While ieObj.readyState <> READYSTATE_COMPLETE  'wait by repeating loop until ready
Loop

For Each htmlEle In ieObj.document.getElementsByClassName("nav-list-item")
                harborDesc = Split(htmlEle.innerText, htmlEle.getElementsByTagName("span")(1).innerText)
Next htmlEle

提前感谢任何提示/帮助

标签: excelvbaweb-scraping

解决方案


您可以设置一个 nodeList 并循环它们以寻找您的搜索词。

nodeList 是从带有 Or 语法的 css 查询生成的,这意味着您将获得

<span class="nav-list-item">  

但也匹配那些跨标签相邻兄弟元素,例如

<span class="nav-list-item">Retail UPC:</span> 
<span>upcNumber</span> 

您使用Instr.innerText匹配您的第一个搜索词。然后,如果找到,请使用Replace删除匹配的文本并保留问题中指定的其余部分。

如果您Retail UPC在给定索引处找到,则upcNumber应该在下一个索引处。


VBA:

Option Explicit
Public Sub FindInfo()
    Const SEARCH_TERM1 As String = "mySearchString"
    Const SEARCH_TERM2 As String = "Retail UPC:"
    Dim html As HTMLDocument, searchTermCandidates As Object
    Dim i As Long, index As Long, ieObj As InternetExplorer
    Set ieObj = New InternetExplorer
    With ieObj
        .Visible = True
        .Navigate2 "url"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Set html = .document

        Set searchTermCandidates = html.querySelectorAll("span.nav-list-item, span.nav-list-item + span")
        For i = 0 To searchTermCandidates.Length - 1
            If InStr(searchTermCandidates.item(i).innerText, SEARCH_TERM1) > 0 Then
                Debug.Print Replace$(searchTermCandidates.item(i).innerText, SEARCH_TERM1, vbNullString)
            End If
            If searchTermCandidates.item(i).innerText = SEARCH_TERM2 Then
                Debug.Print searchTermCandidates.item(i + 1).innerText
            End If
        Next
        .Quit
    End With
End Sub

推荐阅读