excel - InnerText 为特定的跨度类返回空
问题描述
我正在尝试从该网站检索常规 (126,37€) 和减价 (101,10€) 价格信息。
简化的 HTML 代码如下所示:
<div class="vw-productFeatures ">
<ul class="feature-list -price-container">
<li class="feature -price">
<span class="value">126,37</span>
</li>
</ul>
<ul class="feature-list vw-productVoucher">
<li class="voucher-information">Mit Code
<span class="voucher-reduced-price">101,10</span>
</li>
</ul>
</div>
所以,我基本上是一步一步来的(div class -> ul class -> li class -> span class),最后得到innerText。
但是,我能够得到正常价格spanclass.innerText
的降价退货。
这是我正在使用的代码:
Function getHTMLelemFromCol(HTMLColIn As MSHTML.IHTMLElementCollection, tagNameIn As String, classNameIn As String) As MSHTML.IHTMLElement
Dim HTMLitem As MSHTML.IHTMLElement
For Each HTMLitem In HTMLColIn
If (HTMLitem.tagName = tagNameIn) Then
If (HTMLitem.className = classNameIn) Then
Set getHTMLelemFromCol = HTMLitem
Exit For
End If
End If
Next HTMLitem
End Function
Function getPrice(webSite As String, divClass As String, ulClass As String, liClass As String, spanClass As String) As String
Dim XMLPage As New msxml2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim HTMLitem As MSHTML.IHTMLElement
Dim HTMLObjCol As MSHTML.IHTMLElementCollection
XMLPage.Open "GET", webSite, False
XMLPage.send
HTMLDoc.body.innerHTML = XMLPage.responseText
Set HTMLObjCol = HTMLDoc.getElementsByClassName(divClass)
Set HTMLitem = getHTMLelemFromCol(HTMLObjCol, "DIV", divClass) ' Find the div class we are interested in first
Set HTMLitem = getHTMLelemFromCol(HTMLitem.Children, "UL", ulClass) ' Find the ul class we are interested in
Set HTMLitem = getHTMLelemFromCol(HTMLitem.Children, "LI", liClass) ' Find the li class we are interested in
Set HTMLitem = getHTMLelemFromCol(HTMLitem.Children, "SPAN", spanClass) ' Find the span class we are interested in
getPrice = HTMLitem.innerText
End Function
Sub Run()
Dim webSite As String, divClass As String, ulClass As String, liClass As String, spanClass As String, regularPrice As String, reducedPrice As String
webSite = "https://www.rakuten.de/produkt/msi-b450-tomahawk-max-atx-mainboard-4x-ddr4-max-64gb-1x-dvi-d-1x-hdmi-14-1x-usb-c-31-2843843890"
divClass = "vw-productFeatures "
' Get the regular price
ulClass = "feature-list -price-container"
liClass = "feature -price"
spanClass = "value"
regularPrice = getPrice(webSite, divClass, ulClass, liClass, spanClass)
' Get the reduced price
ulClass = "feature-list vw-productVoucher -hide"
liClass = "voucher-information"
spanClass = "voucher-reduced-price"
reducedPrice = getPrice(webSite, divClass, ulClass, liClass, spanClass)
Debug.Print "Regular price: " & regularPrice
Debug.Print "Reduced price: " & reducedPrice
End Sub
我得到的输出:
Regular price: 126,37
Reduced price:
调试器显示它能够找到正确的跨度类,但它没有任何具有价格信息的属性(包括 innerText)。
如何获取降价信息?
解决方案
有时当页面的大部分内容依赖于 API 调用时,使用浏览器自动化会更容易。
从性能的角度来看,这并不理想,但可以更快地投入使用,并且可以在紧要关头工作。另一种方法是监控您和服务器之间的网络流量,看看您是否可以模拟网络请求以降低价格。这会更快,但可能需要一些时间来弄清楚它是如何工作的。
每种方法都需要权衡取舍。下面是一些 Internet Explorer 自动化代码,它可以帮助我检索我相信您所追求的数据。
Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub GetReducedPrice()
Dim text As String
With CreateObject("internetexplorer.application")
.navigate "https://www.rakuten.de/produkt/msi-b450-tomahawk-max-atx-mainboard-4x-ddr4-max-64gb-1x-dvi-d-1x-hdmi-14-1x-usb-c-31-2843843890"
Do While .Busy And .readyState <> 4: DoEvents: Loop
Sleep 1000 ' wait a little bit too
text = .document.querySelector(".voucher-reduced-price").innerText
.Quit
End With
Debug.Print "the reduced price is: " & text
End Sub
结果是:
the reduced price is: 101,10
推荐阅读
- css - 如何将卡片内的所有内容对齐?
- php - Wordpress 数字分页未出现在自定义循环中
- node.js - “sudo npm install -g npm”减慢开发服务器的启动速度
- python - 如何根据列表中元组的第二个元素的总和对列表进行排序?
- azure - 如何使用 Terraform 在 Azure 中创建多租户服务主体
- javascript - 从本地前端应用程序到本地后端 Sails JS Blueprint REST 路由的 XHR 请求出现 401 错误
- algorithm - 执行不同时期的功能
- react-native - 禁用特定屏幕的抽屉
- angular-ui-router - 语音语音搜索角度中的编码器地球问题
- c - 如何增加作为参数传递给 C 中函数的变量的值