excel - 使用 XMLHTTP 抓取会在特定类名处引发错误
问题描述
我正在尝试使用此代码抓取网站以提取姓名和联系人...
Sub Test()
Dim htmlDoc As Object
Dim htmlDoc2 As Object
Dim elem As Variant
Dim tag As Variant
Dim dns As String
Dim pageSource As String
Dim pageSource2 As String
Dim url As String
Dim row As Long
row = 2
dns = "https://www.zillow.com/detroit-mi/real-estate-agent-reviews/"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", dns, True
.send
While .readyState <> 4: DoEvents: Wend
If .statusText <> "OK" Then
MsgBox "ERROR" & .Status & " - " & .statusText, vbExclamation
Exit Sub
End If
pageSource = .responseText
End With
Set htmlDoc = CreateObject("htmlfile")
htmlDoc.body.innerHTML = pageSource
Dim xx '这里出错 Set xx = htmlDoc.getElementsByClassName("ldb-contact-summary")
Set htmlDoc = Nothing
Set htmlDoc2 = Nothing
End Sub
尝试使用此行时
Set xx = htmlDoc.getElementsByClassName("ldb-contact-summary")
我收到一个错误“对象不支持该属性或方法”(438)你能帮我吗,因为我不太擅长抓取问题?
解决方案
要获取姓名及其对应的电话号码,您可以尝试以下代码段:
Sub GetProfileInfo()
Const URL$ = "https://www.zillow.com/detroit-mi/real-estate-agent-reviews/?page="
Dim Http As New XMLHTTP60, Html As New HTMLDocument
Dim post As HTMLDivElement, R&, P&
For p = 1 To 3 'put here the highest number you wanna traverse
With Http
.Open "GET", URL & p, False
.send
Html.body.innerHTML = .responseText
End With
For Each post In Html.getElementsByClassName("ldb-contact-summary")
With post.querySelectorAll(".ldb-contact-name a")
If .Length Then R = R + 1: Cells(R, 1) = .item(0).innerText
End With
With post.getElementsByClassName("ldb-phone-number")
If .Length Then Cells(R, 2) = .item(0).innerText
End With
Next post
Next p
End Sub
引用添加到库中以执行上述脚本:
Microsoft xml, v6.0
Microsoft Html Object Library
推荐阅读
- java - 将 IntelliJ 更新到 2021.2(社区版)后,我无法运行项目 - 构建过程异常终止
- ios - 这两个 addAction 声明有什么区别?
- python - 在 Spacy v3 数据格式问题中训练 textcat_multilabel 模型
- google-ads-api - 谷歌广告 API - REST API
- python - 计算原始文件中的词频并映射它们
- powershell - 找到一个具有前导空格的字符串并在下面添加行以在窗口中添加行
- linux - yocto 构建 linux 内核模块
- .net - 循环遍历 DataTable 以将 Datetime 值更新为 Date 值
- javascript - 在此特殊图像滑块中居中此文本
- node.js - 如何在子集合中创建子文档时触发 Firestore 功能