首页 > 解决方案 > 使用 VBA 抓取 div 类信息

问题描述

我很难从这个网页中提取 div 类信息,https: //ca.finance.yahoo.com/quote/AAPL/financials?p=AAPL 。

损益表数据过去具有不难提取的表标签名称,但已更改。以下是新 HTML 的示例:

<div class="D(tbrg)" data-reactid="44">
<div class="rw-expnded" data-test="fin-row" data-reactid="45">
<div class="D(tbr) fi-row Bgc($hoverBgColor):h" data-reactid="46">
<div class="D(tbc) Ta(start) Pend(15px)--mv2 Pend(10px) Bxz(bb) Py(8px) Bdends(s) Bdbs(s) 
Bdstarts(s) Bdstartw(1px) Bdbw(1px) Bdendw(1px) Bdc($seperatorColor) Pos(st) Start(0) 
Bgc($lv2BgColor) fi-row:h_Bgc($hoverBgColor) Pstart(15px)--mv2 Pstart(10px)" data-reactid="47">
<div class="D(ib) Va(m) Ell Mt(-3px) W(215px)--mv2 W(200px) " title="Total Revenue" data- 
reactid="48"><span class="Va(m)" data-reactid="49">Total Revenue</span>
</div>
<div class="W(3px) Pos(a) Start(100%) T(0) H(100%) Bg($pfColumnFakeShadowGradient) Pe(n) Pend(5px)" 
data-reactid="50"></div>

我想从示例 HTML 代码中提取内部文本“总收入”。

Sub financial()

Dim XMLPage As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim HTMLTables As MSHTML.IHTMLElementCollection
Dim HTMLRow As MSHTML.IHTMLElement
Dim HTMLCell As MSHTML.IHTMLElement

XMLPage.Open "GET", "https://ca.finance.yahoo.com/quote/AAPL/financials?p=AAPL", False
XMLPage.send
HTMLDoc.body.innerHTML = XMLPage.responseText

Set HTMLTables = HTMLDoc.getElementsByClassName("d(tbrg)")

With HTMLTables
    For Each HTMLRow In HTMLTables.getElementsByClassName("rw-expnded")
        For Each HTMLCell In HTMLRow.Children
            Debug.Print HTMLCell.innerText
        Next HTMLCell
    Next HTMLRow
End With

End Sub

标签: excelvbaweb-scraping

解决方案


您可以使用data-test属性与值fin-col及其子项的关系span

Option Explicit
Public Sub PrintFinancials()
    Dim XMLPage As New MSXML2.XMLHTTP60
    Dim HTMLDoc As New MSHTML.HTMLDocument

    XMLPage.Open "GET", "https://ca.finance.yahoo.com/quote/AAPL/financials?p=AAPL", False
    XMLPage.send
    HTMLDoc.body.innerHTML = XMLPage.responseText

    Dim i As Long
    For i = 0 To 3
        Debug.Print HTMLDoc.querySelectorAll("[data-test=fin-col] span").Item(i).innerText
    Next
End Sub

推荐阅读