首页 > 解决方案 > 使用 XMLHTTP 抓取会在特定类名处引发错误

问题描述

我正在尝试使用此代码抓取网站以提取姓名和联系人...

Sub Test()
Dim htmlDoc         As Object
Dim htmlDoc2        As Object
Dim elem            As Variant
Dim tag             As Variant
Dim dns             As String
Dim pageSource      As String
Dim pageSource2     As String
Dim url             As String
Dim row             As Long

row = 2
dns = "https://www.zillow.com/detroit-mi/real-estate-agent-reviews/"

With CreateObject("MSXML2.XMLHTTP")
    .Open "GET", dns, True
    .send

    While .readyState <> 4: DoEvents: Wend

    If .statusText <> "OK" Then
        MsgBox "ERROR" & .Status & " - " & .statusText, vbExclamation
        Exit Sub
    End If

    pageSource = .responseText
End With

Set htmlDoc = CreateObject("htmlfile")
htmlDoc.body.innerHTML = pageSource

Dim xx '这里出错 Set xx = htmlDoc.getElementsByClassName("ldb-contact-summary")

Set htmlDoc = Nothing
Set htmlDoc2 = Nothing
End Sub

尝试使用此行时

Set xx = htmlDoc.getElementsByClassName("ldb-contact-summary")

我收到一个错误“对象不支持该属性或方法”(438)你能帮我吗,因为我不太擅长抓取问题?

标签: excelvbaweb-scraping

解决方案


要获取姓名及其对应的电话号码,您可以尝试以下代码段:

Sub GetProfileInfo()
    Const URL$ = "https://www.zillow.com/detroit-mi/real-estate-agent-reviews/?page="
    Dim Http As New XMLHTTP60, Html As New HTMLDocument
    Dim post As HTMLDivElement, R&, P&

    For p = 1 To 3 'put here the highest number you wanna traverse
        With Http
            .Open "GET", URL & p, False
            .send
            Html.body.innerHTML = .responseText
        End With

        For Each post In Html.getElementsByClassName("ldb-contact-summary")
            With post.querySelectorAll(".ldb-contact-name a")
                If .Length Then R = R + 1: Cells(R, 1) = .item(0).innerText
            End With

            With post.getElementsByClassName("ldb-phone-number")
                If .Length Then Cells(R, 2) = .item(0).innerText
            End With
        Next post
    Next p
End Sub

引用添加到库中以执行上述脚本:

Microsoft xml, v6.0
Microsoft Html Object Library

推荐阅读