vba - VBA 抓取:需要对象
问题描述
Dim oIE As InternetExplorer
Dim wwwAdd$, NazwaArk$, AdrStrony$
Dim doc As HTMLDocument
Dim wElement As HTMLAnchorElement
Dim row As Long, col%, n%, pozDat% ' co to są te procenty??
Dim PoprzedniElement As HTMLAnchorElement
Dim Tekst As String
Dim NrOferty As Variant
Dim i, j As Integer
Dim IleSekund As Integer
Dim newHour As Variant
Dim newMinute As Variant
Dim newSecond As Variant
Dim waitTime As Variant
Dim link As Integer
Dim maxlink As Integer
Set oIE = New SHDocVw.InternetExplorer
oIE.Visible = True
oIE.Navigate AdrStrony, 2
row = row + 1
Do Until oIE.ReadyState = tagREADYSTATE.READYSTATE_COMPLETE: Loop
Set doc = oIE.Document
For Each wElement In doc.getElementsByClassName("info")(0).getElementsByTagName("a")(3) ' "a" = a href
MsgBox (wElement)
Sheets("Oferty").Cells(1, 1).Value = wElement
Next wElement
oIE.Quit
MS VBA 说该对象是必需的,但是当我在第一行(实际上是第 25 行:)中删除(3)时,“一切都很好”。它给了我三个 MsgBoxes(1. 我想要的链接,2. 空的,3. 我想要的链接)。所以最简单的方法是采取第三种选择。
我通过 QHarr 实现了代码,但我想从网站的一页中获取 20 个链接(下一步它将在多页上执行),所以我做了一个简单的 For Next:
Set oIE = New SHDocVw.InternetExplorer
oIE.Visible = True
oIE.Navigate AdrStrony, 2 'AdrStrony is the the www site address
For iNo = 0 To 20
With oIE
While .Busy Or .readyState < 4: DoEvents: Wend 'I personally don't understand this line
Set doc = oIE.document
For Each wElement In doc.getElementsByClassName("info")(iNo).getElementsByTagName("a")
'MsgBox (wElement)
If iRow <> 2 Then 'Works like a charm, but how the magic works? What it does actually?
iRow = iRow + 1
On Error Resume Next '<=In case no href attribute
Worksheets("Oferty").Cells(iRow, 1) = wElement.getAttribute("href")
'Worksheets("Oferty").Cells(iRow, 1) = wElement.innerText
On Error GoTo 0
End If
.Quit
Next wElement
'other code
End With
Next iNo
但它显示自动化错误(未知错误)。
解决方案
新信息:
根据您的新信息,这里是您如何使用 XHR 在没有打开浏览器的情况下在第一页上获得 20 个列出的属性:
Option Explicit
Public Sub GetInfo()
Dim sResponse As String, html As New HTMLDocument, i As Long
Const BASE_URL As String = "http://www.kontrakt.szczecin.pl"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://www.kontrakt.szczecin.pl/lista-ofert/?f_location_locality[0]=szczecin&submit=Szukaj&offset=0", False
.send
sResponse = StrConv(.responseBody, vbUnicode)
sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
Dim propertyList As Object
With html
.body.innerHTML = sResponse
Set propertyList = html.querySelectorAll(".info")
For i = 0 To propertyList.Length - 1
Debug.Print BASE_URL & Split(Split(propertyList.item(i).innerHTML, "about:/")(1), Chr$(34) & ">")(0)
Next
End With
End With
End Sub
您可以通过以下方式获取结果的页数(如果您想稍后循环它们):
Dim numPages As Long
numPages = html.querySelectorAll(".pagination li")(6).innerText
它不如 健壮.pagination li:nth-last-child(3)
,但似乎 VBA 不支持该选择器。
浏览器也一样:
Option Explicit
Public Sub SubCentalineAutomation()
Dim myIE As InternetExplorer, propertyList As Object, i As Long
Const BASE_URL As String = "http://www.kontrakt.szczecin.pl"
Set myIE = New InternetExplorer
With myIE
.navigate "http://www.kontrakt.szczecin.pl/lista-ofert/?f_location_locality[0]=szczecin&submit=Szukaj&offset=0"
.Visible = True
While .Busy Or .readyState < 4: DoEvents: Wend
Set propertyList = .document.querySelectorAll("#properties header > a")
For i = 0 To propertyList.Length - 1
Debug.Print BASE_URL & propertyList.item(i).getAttribute("href")
Next i
'.Quit '<== Remember to quit application
End With
End Sub
旧信息(您的原始问题):
您想在一个集合中获得 3 个链接中的 2 个。这意味着你不想要doc.getElementsByClassName("info")(0).getElementsByTagName("a")(3)
. 集合中的 3 个索引并返回单个元素。显示相关的 HTML 和预期的链接,将有助于澄清这里的问题。
您没有显示 tagREADYSTATE
. 我会去 While oIE.Busy Or oIE.readyState < 4: DoEvents: Wend
确保页面加载。
在For Each
您显示的循环中,如果选择有效,则对象将返回为wElement
. 然后你尝试Sheets("Oferty").Cells(1, 1).Value = wElement
. 我希望,您要么使用 . innerText
的属性wElement
,或使用.getAttribute("href")
, 来获得a
标签的链接。
此外,我将使用 Worksheets 集合,以及循环内的递增行变量,并使用明确的名称,例如 iRow。要仅获取第 1 和第 3 个链接,您可以测试 iRow 的值以确定是否检索链接,例如If iRow <> 2 Then
.
一个可能看起来像的例子:
Option Explicit
Public Sub test1()
'Other code
Dim iRow
With oIE
While .Busy Or .readyState < 4: DoEvents: Wend
Set doc = oIE.document
For Each wElement In doc.getElementsByClassName("info")(0).getElementsByTagName("a")
If iRow <> 2 Then
iRow = iRow + 1
On Error Resume Next '<=In case no href attribute
Worksheets("Oferty").Cells(iRow, 1) = wElement.getAttribute("href")
' Worksheets("Oferty").Cells(iRow, 1) = wElement.innerText
On Error GoTo 0
End If
Next wElement
'other code
.Quit
End With
End Sub
推荐阅读
- html - 使 div 滚动而不是扩展以适应其内容
- sas - 在 Aginity 上将 sas 文件加载到 netezza
- sql - 需要帮助 - 创建触发器以更新列
- sharepoint - 在一个 SharePoint 上开发 SharePoint Web 部件,但将部署在完全不同的一个上
- wildfly - 迁移期间的 EAP 7.X ClassNotFoundException
- excel - 根据工作表名称自动向各个收件人发送电子邮件
- python - 如何使用python通过FT上传许多文件?
- twitter-bootstrap - 悬停内部对象时如何删除工具提示?
- if-statement - 在时序收敛、多个“ifs”或“if-else”方面哪个更快?
- ajax - 如何将字符串参数传递给自定义存储库方法?