首页 > 解决方案 > 使用VBA从网页表中提取数据

问题描述

我想从网页中的表格中提取数据。我的 html 表格代码如下所示

<table class=table_x_write cellspacing=1
bordercolordark=#d5d5d5 cellpadding=0 width="100%"
align=center bgcolor=#FFFFFF bordercolorlight=#a5a5a5
border=1>
<tr>
    <td class="title_1" align="right" width="20%">
        <div align=right>case number:&lt;/div></td>
    <td class="item_1" width="30%"><input type="text" name="sgdabh" tabindex="1" value="3710140068" readonly="readonly" style="width:98%" class="bg0" id="sgdabhD"></td>
    <TD class="title_1" align="right">participant number:&lt;/td>
    <td class="item_1" width="30%"><span
        onmouseover="setEnabled('cyfbh',1)"
        onmouseout="setEnabled('cyfbh',2)"> <select name="cyfbh" style="width:100%"><option value=""></option>
        <option value="371014006801" selected="selected">371014006801-passenger car</option>
        <option value="371014006802">371014006802-two/three wheeled motorcycle</option></select> </span></td>

</tr>
<tr>
    <TD class="title_1" align="right">licence number - city:&lt;/td>
    <TD class="item_1"><input type="text" name="cph_cs" value="豫" style="width:98%" class="bg0"></TD>
    <TD class="title_1" align="right">licence number - letter:&lt;/td>
    <td class="item_1"><input type="text" name="cph_zm" maxlength="5" tabindex="6" value="C" style="width:98%" class="bg0"></td>
</tr>
<tr>
    <TD class="title_1" align="right">licence number - number:&lt;/td>
    <TD class="item_1"><input type="text" name="cph_sz" maxlength="5" value="8****" style="width:98%" class="bg0"></TD>
    <TD class="title_1" align="right">type of vehicle (own definition):&lt;/td>
    <td class="item_1"><span
        onmouseover="setEnabled('cllx1',1)"
        onmouseout="setEnabled('cllx1',2)"> <select name="cllx1" style="width:100%"><option value=""></option>
        <option value="2">2-two/three electric wheeler</option>
        <option value="3" selected="selected">3-passenger car(≤9 seats)</option>
        <option value="4">4-HGV</option>
        <option value="5">5-bus(>9 seats)</option>          
        <option value="6">6-military vehicle</option></select> </span>
    </td>
</tr>
</table>

我试过下面的代码

Sub data()
   x = 0
   Set objShell = CreateObject("Shell.Application")
   For Each w In objShell.Windows
       If w.Name = "Internet Explorer" And w.LocationName = "vehicle data"  Then
        Set externalIE = objShell.Windows(x)
        Set externalIEDoc = externalIE.Document
        Set elemCollection = externalIE.Document.getElementsByTagName("TABLE")
        MsgBox (elemCollection.Length)
        For t = 0 To (elemCollection.Length - 1)
            For r = 0 To (elemCollection(t).Rows.Length - 1)
                For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
                    ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText
                Next c
            Next r
        Next t
        MsgBox (w.LocationName)
        externalIEDoc.getelementsbyname("btnDown")(0).Click
    End If
    x = x + 1
Next
End Sub

但我得到了垃圾数据。我需要该值作为案例编号:3710140068 参与者编号:371014006801-passenger car 我需要从文本和选定选项中提取数据。请帮忙

标签: vba

解决方案


您是否需要比以下内容更多的内容(注意:这里我正在将您的 HTML 从一个单元格读取到 HTML 文档中。您会按照正常方式进行操作。)。它显然不是非常健壮,但我不认为你只是循环行然后列在里面并做 HTMLCell.innerText


早期绑定:

代码:

Option Explicit

Public Sub GetTableInfo()
    Dim html As HTMLDocument
    Set html = New HTMLDocument
    html.body.innerHTML = [A1].Text '<== You would obtain in normal way. I just read your HTML in from sheet.

    Debug.Print "Case number: " & html.querySelectorAll("table tr td[class=""item_1""] input")(0).Value
    Debug.Print "licence number - city: " & html.querySelectorAll("table tr td[class=""item_1""] input")(1).Value
    Debug.Print "licence number - letter: " & html.querySelectorAll("table tr td[class=""item_1""] input")(2).Value
    Debug.Print "licence number - number: " & html.querySelectorAll("table tr td[class=""item_1""] input")(3).Value
    Debug.Print "participant number: " & html.querySelectorAll("table tr td[class=""item_1""] option[selected=""selected""]")(0).innerText
    Debug.Print "type of vehicle (own definition): " & html.querySelectorAll("table tr td[class=""item_1""] option[selected=""selected""]")(1).innerText

End Sub

OP 后期绑定版本:

没有看到更广泛的 HTML 上下文,这里有一段有点脆弱的代码:

Option Explicit

Public Sub GetTableInfo()
    Dim html As Object
    Set html = CreateObject("htmlfile")

    html.body.innerhtml = [A1].Text              '<== You would obtain in normal way. I just read your HTML in from sheet.

    html.getElementsByTagName ("table")
    Dim b As Object
    Set b = html.getElementsByTagName("table")(0).getElementsByTagName("input")

    Debug.Print "Case number = " & b(0).Value
    Debug.Print "licence number - city:  = " & b(1).Value
    Debug.Print "licence number - letter = " & b(2).Value
    Debug.Print "licence number - number = " & b(3).Value

    Dim c As Object
    Set c = html.getElementsByTagName("table")(0).getElementsByTagName("option")

    Dim i As Long, n As Long

    For i = 0 To c.Length - 1
        If InStr(c(i).outerHTML, "selected value=") > 0 Then
            n = n + 1
            If n = 1 Then
                Debug.Print "participant number: " & c(i).innerText
            ElseIf n = 2 Then
                Debug.Print "type of vehicle (own definition) " & c(i).innerText
            End If
        End If
    Next i

End Sub

输出:

这与 .querySelector 相同。

输出

不幸的是,豫出来了?


解析HTML:

您还可以解析 HTML。为您的实际完整 HTML 量身定制,但您可以使用提供的代码段来执行此操作:

Dim arr() As String
arr = Split([A1].Text, "class=""item_1""")
Debug.Print "Case number = " & Split(Split(arr(1), "value=")(1), Chr(32))(0)
Debug.Print "participant number: " & Split(Split(Split(arr(2), "value=")(2), "selected=""selected"">")(1), "<")(0)
Debug.Print "licence number - city:  = " & Split(Split(arr(3), "value=")(1), Chr(32))(0)
Debug.Print "licence number - letter = " & Split(Split(arr(4), "value=")(1), Chr(32))(0)
Debug.Print "licence number - number = " & Split(Split(arr(5), "value=")(1), Chr(32))(0)
Debug.Print "type of vehicle (own definition) " & Split(Split(arr(6), "selected=""selected"">")(1), "<")(0)

推荐阅读