首页 > 解决方案 > 使用 Excel VBA 从基于 Java 的网页上的同一表类中提取特定数据

问题描述

网站上有一个表格和代码如下。

<div id="InnerContent" style="height: 668px; position: relative; overflow: auto;">
<table class="Template" width="100%">
    <tbody><tr class="Template">
        <td class="View">
            <table class="View" width="100%">
                <tbody><tr class="View">
                    <td class="View">
<p>there are some wording.</p>
  <div class="block-indent">
    <table class="Main">
      <tbody><tr class="Main">
        <td class="Literal"><a href="/default.aspx?Guid=asda334&amp;MenuId=&amp;Action=edit&amp;Reference=saada">1234567</a></td>
        <td class="Literal"><b>Amended</b></td>
        <td class="Literal">(aadasda) Total : 2232<br></td>
      </tr>
      <tr class="Main">
        <td class="Literal"><a href="/default.aspx?Guid=sdfs2323&amp;MenuId=&amp;Action=Edit&amp;Reference=edasd">123123</a></td>
        <td class="Literal"><b>Amended</b></td>
        <td class="Literal">(adasda) Total : 123<br></td>
      </tr>
      <tr class="Main">
        <td class="Literal"><a href="/default.aspx?Guid=12321asada&amp;MenuId=&amp;Action=Edit&amp;Reference=assada">97897</a></td>
        <td class="Literal"><b>Amended</b></td>
        <td class="Literal">(bdfgbgf) Total : 999<br></td>
      </tr>
    </tbody></table>
  </div>
<table class="Main">
 <tbody><tr class="Main">
  <td class="Literal" nowrap="">abc:</td>
  <td class="Field" title=""><span class="String">030</span></td>
  <td class="Literal">&nbsp;&nbsp;&nbsp;</td>
  <td class="Literal" nowrap="">cde:</td>
  <td class="Field" title=""><span class="String">1234567890</span></td>
 </tr>

 <tr class="Main">
  <td class="Literal" nowrap="">Version:</td>
  <td class="Field" title="older Version': 02"><span class="Changed String">03</span></td>
  <td class="Literal"></td>  <td class="Literal" nowrap="">Last Amended:</td>
  <td class="Field" title="'Last Amended': 13 Sep 21"><span class="Changed Date">15 Sep 21</span></td>
  <td class="Literal">&nbsp;&nbsp;&nbsp;</td>
  <td class="Literal" nowrap="">Revised:</td>
  <td class="Field" title=""><span class="String">&nbsp;</span></td>
 </tr>

 <tr class="Main">

  <td class="Literal" nowrap="">Order:</td>
  <td class="Field" title=""><span class="String">A (Amended)</span></td>
  <td class="Literal"></td>
  <td class="Literal" nowrap="">Order2:</td>
  <td class="Field" title=""><span class="String">W (Order)</span></td>
 </tr>

在应用“块缩进”部分之前,我曾经使用以下代码从网站获取数据。

Sheets("Sheetname").Range("E5") = ie.document.getElementById("InnerContent").getElementsByClassName("Template")(0).getElementsByClassName("View")(0).getElementsByClassName("Main")(2).getElementsByClassName("Field")(0).innerText

结果是 02 与“版本”相关,因为它是第二个主表结果。在网站代码中添加“块缩进”部分后,主表数量不再恒定。这意味着如果块缩进有 3 个主表,则版本可以放在第 5 位,或者可以在块缩进有 4 个主表的情况下放在第 6 位。

我试图获取整个表格,但我总是只能获取“块缩进”部分的数据。那么我如何获取“版本”的数据?

标签: excelvba

解决方案


遍历表格以找到您想要的表格。

Option Explicit

Sub demo()

    Dim oDom As Object:
    Set oDom = CreateObject("HtmlFile")
   
    ' read html from file for testing
    Dim fso As Object, ts As Object
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set ts = fso.opentextfile("table.html")
    oDom.body.innerHTML = ts.readall
    ts.Close
    
    '
    Dim tbl As HTMLTable, r As HTMLTableRow
    For Each tbl In oDom.getElementsByTagName("table")
        For Each r In tbl.Rows
            If r.Cells(0).innerText = "Version:" Then
                Debug.Print r.Cells(1).innerText
            End If
            If r.Cells(0).innerText = "abc:" Then
                Debug.Print r.Cells(4).innerText
            End If
        Next
    Next
End Sub

推荐阅读