首页 > 解决方案 > InnerText 为特定的跨度类返回空

问题描述

我正在尝试从网站检索常规 (126,37€) 和减价 (101,10€) 价格信息。

简化的 HTML 代码如下所示:

<div class="vw-productFeatures ">
  <ul class="feature-list -price-container">
    <li class="feature -price">
      <span class="value">126,37</span>
    </li>
  </ul>
  <ul class="feature-list vw-productVoucher">
    <li class="voucher-information">Mit Code
      <span class="voucher-reduced-price">101,10</span>
    </li>
  </ul>
</div>

所以,我基本上是一步一步来的(div class -> ul class -> li class -> span class),最后得到innerText。

但是,我能够得到正常价格spanclass.innerText的降价退货。

这是我正在使用的代码:

Function getHTMLelemFromCol(HTMLColIn As MSHTML.IHTMLElementCollection, tagNameIn As String, classNameIn As String) As MSHTML.IHTMLElement
    Dim HTMLitem As MSHTML.IHTMLElement

    For Each HTMLitem In HTMLColIn
        If (HTMLitem.tagName = tagNameIn) Then
            If (HTMLitem.className = classNameIn) Then
                Set getHTMLelemFromCol = HTMLitem
                Exit For
            End If
        End If
    Next HTMLitem
End Function
Function getPrice(webSite As String, divClass As String, ulClass As String, liClass As String, spanClass As String) As String
    Dim XMLPage As New msxml2.XMLHTTP60
    Dim HTMLDoc As New MSHTML.HTMLDocument
    Dim HTMLitem As MSHTML.IHTMLElement
    Dim HTMLObjCol As MSHTML.IHTMLElementCollection

    XMLPage.Open "GET", webSite, False
    XMLPage.send
    HTMLDoc.body.innerHTML = XMLPage.responseText

    Set HTMLObjCol = HTMLDoc.getElementsByClassName(divClass)
    Set HTMLitem = getHTMLelemFromCol(HTMLObjCol, "DIV", divClass)          ' Find the div class we are interested in first
    Set HTMLitem = getHTMLelemFromCol(HTMLitem.Children, "UL", ulClass)     ' Find the ul class we are interested in
    Set HTMLitem = getHTMLelemFromCol(HTMLitem.Children, "LI", liClass)     ' Find the li class we are interested in
    Set HTMLitem = getHTMLelemFromCol(HTMLitem.Children, "SPAN", spanClass) ' Find the span class we are interested in

    getPrice = HTMLitem.innerText
End Function
Sub Run()
    Dim webSite As String, divClass As String, ulClass As String, liClass As String, spanClass As String, regularPrice As String, reducedPrice As String

    webSite = "https://www.rakuten.de/produkt/msi-b450-tomahawk-max-atx-mainboard-4x-ddr4-max-64gb-1x-dvi-d-1x-hdmi-14-1x-usb-c-31-2843843890"
    divClass = "vw-productFeatures "

    ' Get the regular price
    ulClass = "feature-list -price-container"
    liClass = "feature -price"
    spanClass = "value"
    regularPrice = getPrice(webSite, divClass, ulClass, liClass, spanClass)

    ' Get the reduced price
    ulClass = "feature-list vw-productVoucher -hide"
    liClass = "voucher-information"
    spanClass = "voucher-reduced-price"
    reducedPrice = getPrice(webSite, divClass, ulClass, liClass, spanClass)

    Debug.Print "Regular price: " & regularPrice
    Debug.Print "Reduced price: " & reducedPrice
End Sub

我得到的输出:

Regular price: 126,37
Reduced price: 

调试器显示它能够找到正确的跨度类,但它没有任何具有价格信息的属性(包括 innerText)。

如何获取降价信息?

标签: excelvbaweb-scraping

解决方案


有时当页面的大部分内容依赖于 API 调用时,使用浏览器自动化会更容易。

从性能的角度来看,这并不理想,但可以更快地投入使用,并且可以在紧要关头工作。另一种方法是监控您和服务器之间的网络流量,看看您是否可以模拟网络请求以降低价格。这会更快,但可能需要一些时间来弄清楚它是如何工作的。

每种方法都需要权衡取舍。下面是一些 Internet Explorer 自动化代码,它可以帮助我检索我相信您所追求的数据。

Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)

Sub GetReducedPrice()
    Dim text As String

    With CreateObject("internetexplorer.application")
        .navigate "https://www.rakuten.de/produkt/msi-b450-tomahawk-max-atx-mainboard-4x-ddr4-max-64gb-1x-dvi-d-1x-hdmi-14-1x-usb-c-31-2843843890"
         Do While .Busy And .readyState <> 4: DoEvents: Loop
         Sleep 1000 ' wait a little bit too
         text = .document.querySelector(".voucher-reduced-price").innerText
        .Quit
    End With

    Debug.Print "the reduced price is: " & text
End Sub

结果是:

the reduced price is: 101,10


推荐阅读