首页 > 解决方案 > Excel VBA 提取 aria-label 值

问题描述

我正在尝试提取 aria-label 属性中的文本,但我所拥有的似乎不起作用。我能够使用相同的代码提取 href 值,所以我认为它也可以工作。任何帮助,将不胜感激。

我使用的网址是https://www.facebook.com/marketplace/item/328932021226229 截屏:

<div class="_3-8z">
  <div>
    <span class="_3ziq">Seller Information</span>
    <div class="clearfix" direction="left">
      <div class="_ohe lfloat">
        <div>
          <a class="img _8o _8t" aria-label="John Smith, View seller profile"
      href="#" data-hovercard="/ajax/hovercard/user.php?id=100002935356728&amp;extragetparams=%7B%22hc_location%22%3A%22marketplace_hovercard%22%2C%22existingThreadID%22%3Anull%2C%22forSaleItemID%22%3A%22328932021226229%22%2C%22name%22%3A%22Zsigmond%20Lali%22%7D" 
      modalProps="[object Object]" 
      profileID="100002935356728" resource="[object Object]">

    Sub Macro2()

marker = 0
Set objShell = CreateObject("Shell.Application")
IE_count = objShell.Windows.Count
For x = 0 To (IE_count - 1)
    On Error Resume Next    ' sometimes more web pages are counted than are open
    my_url = objShell.Windows(x).document.Location
    my_title = objShell.Windows(x).document.Title

    If my_title Like "Marketplace" & "*" Then 'compare to find if the desired web page is already open
        Set IE = objShell.Windows(x)
        marker = 1
        Exit For
    Else
    End If
Next

        Dim aNodeList As Object, i As Long
        Set aNodeList = IE.document.querySelectorAll(".img _8o _8t[aria-label]")
        For i = 0 To aNodeList.Length - 1
            ActiveSheet.Cells(i + 2, 2) = aNodeList.Item(i)
        Next

End Sub

标签: htmlexcelvbaweb-scraping

解决方案


至少在我看来,您的片段链接 HTML 没有出现在链接中。此外,如果您在显示的aria-label.

这个

._3cgd[aria-label]

查找类名._3cgd具有aria-label属性的元素。您的代码段中没有一个。

我希望,但由于上述原因无法正确测试,您可以使用getAttribute,如果无法拆分.outerHTML目标元素。

以下是基于您显示的代码段的更通用的选择器。您可能需要针对您的 HTML 进行调整。仅使用您的代码段getAttribute生成null的,但我不确定由于语法正确,实时页面的行为是否会有所不同。分裂outerHTML回报John Smith, View seller profile

With IE.document.querySelector("a[class='img _8o _8t'][profileid='100002935356728']")
   Debug.Print .getAttribute("aria-label")
   Debug.Print Split(Split(.outerHTML, "aria-label=" & Chr$(34))(1), Chr$(34))(0)
End With

我只使用querySelector和定位profileid,对于所有匹配的带有 aria-label 的类更通用:

Dim eles As Object, i As Long
Set eles = IE.document.querySelectorAll("a[class='img _8o _8t'][aria-label]")
For i = 0 To eles.Length - 1
    With eles.item(i)
        Debug.Print .getAttribute("aria-label")
        Debug.Print Split(Split(.outerHTML, "aria-label=" & Chr$(34))(1), Chr$(34))(0)
    End With
Next

推荐阅读