首页 > 解决方案 > VBA webscraper - 使用正则表达式返回 InnerHTML

问题描述

使用 Excel VBA,我必须从这个网站上抓取一些数据。

由于相关网站对象不包含id,我无法使用HTML.Document.GetElementById.

但是,我注意到相关信息始终存储在<div>-section 中,如下所示:

<div style="padding:7px 12px">Basler Versicherung AG &#214;zmen</div>

问题: 是否可以构造一个RegExp可能在循环中返回内部内容<div style="padding:7px 12px">和下一个的内容</div>

到目前为止我所拥有的是完整InnerHtml的容器,显然我需要添加一些代码来循环尚未构建的 RegExp。

Private Function GetInnerHTML(url As String) As String
    Dim i As Long
    Dim Doc As Object
    Dim objElement As Object
    Dim objCollection As Object

On Error GoTo catch
   'Internet Explorer Object is already assigned
   With ie
        .Navigate url
        While .Busy
            DoEvents
        Wend
        GetInnerHTML = .document.getelementbyId("cphContent_sectionCoreProperties").innerHTML
    End With
    Exit Function
catch:
    GetInnerHTML = Err.Number & " " & Err.Description
End Function

标签: regexvba

解决方案


我认为您不需要正则表达式来查找页面上的内容。您可以使用元素的相对位置来找到我相信您所追求的内容。

代码

Option Explicit

Public Sub GetContent()
    Dim URL     As String: URL = "https://www.uid.admin.ch/Detail.aspx?uid_id=CHE-105.805.649"
    Dim IE      As Object: Set IE = CreateObject("InternetExplorer.Application")
    Dim Labels  As Object
    Dim Label   As Variant
    Dim Values  As Variant: ReDim Values(0 To 1, 0 To 5000)
    Dim i       As Long

    With IE
        .Navigate URL
        .Visible = False

        'Load the page
        Do Until IE.busy = False And IE.readystate = 4
            DoEvents
        Loop

        'Find all labels in the table
        Set Labels = IE.document.getElementByID("cphContent_pnlDetails").getElementsByTagName("label")

        'Iterate the labels, then find the divs relative to these
        For Each Label In Labels
            Values(0, i) = Label.InnerText
            Values(1, i) = Label.NextSibling.Children(0).InnerText
            i = i + 1
        Next

    End With

    'Dump the values to Excel
    ReDim Preserve Values(0 To 1, 0 To i - 1)
    ThisWorkbook.Sheets(1).Range("A1:B" & i) = WorksheetFunction.Transpose(Values)

    'Close IE
    IE.Quit
End Sub

推荐阅读