首页 > 解决方案 > 按类选择嵌套元素

问题描述

我正在研究一个宏来分析某个网站,我正在尝试按类选择嵌套元素。html代码如下:

<div class="publication_info">
<table>
  <tr>
    <th class="first">A</th>
    <th class="second">B</th>
    <th class="third">C</th>
  </tr>
</table>
</div>

我正在尝试使用“第三”类(即字母 C)检索单元格的内容。我的方法是首先按“publication_info”类选择表格,然后按“第三”类选择单元格,但它不起作用。我的代码是:

Dim html As HTMLDocument
Set html = New HTMLDocument

With CreateObject("MSXML2.XMLHTTP")
    .Open "GET", url, False
    .send
    html.body.innerHTML = .responseText
End With

With html
    Set oInfos = .getElementsByClassName("publication_info")(0).getElementsByClassName("third")(0)
End With

奇怪的是,可以轻松访问该表。

有任何想法吗?非常感谢您的帮助!此致

标签: excelvbaselectweb-scrapingnested

解决方案


如果您可以抓取表格,但不能抓取其内容,那么我假设内容是动态重新加载的。在这种情况下,您必须使用 Internet Explorer。MSXML2.XMLHTTP 无法处理动态内容,它只获取页面的第一个静态部分。

试试这样:

Sub TestForThird()

Dim url As String
Dim IE As Object
Dim nodeTable As Object
Dim nodeThird As Object

  url = "Here your URL"

  'Initialize Internet Explorer and load page
  Set IE = CreateObject("InternetExplorer.Application")
  IE.Visible = True
  IE.navigate url
  Do: DoEvents: Loop Until IE.readyState = 4

  'Wait to load dynamic content to the table
  Application.Wait (Now + TimeSerial(0, 0, 5))

  'Get table
  Set nodeTable = IE.document.getElementsByClassName("publication_info")(0)

  'Get third from table
  Set nodeThird = nodeTable.getElementsByClassName("third")(0)

  'Show innertext
  mesgbox nodeThird.innertext
End Sub

推荐阅读