首页 > 解决方案 > VBA网页抓取代码错误与宏链接

问题描述

我已经完成了我创建的这段代码,看起来我遇到了一些问题。当您手动进入代码并运行它时,代码可以正常工作,但是每次我尝试使用宏按钮自动运行代码时都会遇到问题。

我收到运行时错误“70”:权限被拒绝。我不确定为什么代码在我自动运行时会跳闸并抛出此代码。

这个想法是能够在 excel 中输入城镇和州,它将搜索这两个网站的数据。

我附上了下面的代码

'start a new subroutine called SearchBot
Sub SearchBot1()

'dimension (declare or set aside memory for) our variables
Dim objIE As InternetExplorer 'special object variable representing the IE browser
Dim aEle As HTMLLinkElement 'special object variable for an <a> (link) element
Dim HTMLinputs As MSHTML.IHTMLElementCollection
Dim y As Integer 'integer variable we'll use as a counter
Dim result As String 'string variable that will hold our result link

'initiating a new instance of Internet Explorer and asigning it to objIE
Set objIE = New InternetExplorer

'make IE browser visible (False would allow IE to run in the background)
'objIE.Visible = True

'navigate IE to this web page (a pretty neat search engine really)
objIE.navigate "https://www.zillow.com/orange-county-ny/home-values/"

'wait here a few seconds while the browser is busy
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop

'in the search box put cell "A2" value, the word "in" and cell "C1" value
objIE.document.getElementById("local-search").Value = _
  Sheets("Sheet2").Range("B3").Value & ", " & Sheets("Sheet2").Range("B4").Value

'click the 'go' button
Set HTMLinputs = objIE.document.getElementsByTagName("button")
For Each input_element In HTMLinputs
If input_element.getAttribute("name") = "SubmitButton" Then

    input_element.Click
    Exit For
End If
Next input_element


'wait again for the browser
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop


'price for home

Set Doc = objIE.document
Dim cclass As String
cclass = Trim(Doc.getElementsByClassName("value-info-list")(0).Children(0).innerText)
'MsgBox (cclass)

Dim aclass As Variant
aclass = Split(cclass, " ")
Range("Market_Price").Value = aclass(0)


'1-YR Forecast
 cclass = Trim(Doc.getElementsByClassName("value-info-list")(0).Children(1).innerText)
'MsgBox (cclass)
 Dim bclass As Variant
bclass = Split(cclass, " ")
Range("yr_forecast").Value = bclass(0)

'Median List Price
 cclass = Trim(Doc.getElementsByClassName("value-info-list")(0).Children(2).innerText)
'MsgBox (cclass)
 Dim dclass As Variant
 dclass = Split(cclass, " ")
 Range("Median_List_Price").Value = dclass(0)

'Median Sale Price
 cclass = Trim(Doc.getElementsByClassName("value-info-list")(0).Children(3).innerText)
'MsgBox (cclass)
 Dim eclass As Variant
 eclass = Split(cclass, " ")
 Range("Median_Sale_Price").Value = eclass(0)

'Health of market

 cclass = Trim(Doc.getElementsByClassName("value-info-list")(1).Children(0).innerText)
'MsgBox (cclass)

 Dim fclass As Variant
 fclass = Split(cclass, " ")
 Range("Healthy").Value = fclass(0)

' Home with Negative Equity

 cclass = Trim(Doc.getElementsByClassName("value-info-list")(1).Children(1).innerText)
'MsgBox (cclass)

 Dim gclass As Variant
 gclass = Split(cclass, " ")
 Range("Home_With_Negative_Equity").Value = gclass(0)

'Delinquent on Mortgage

 cclass = Trim(Doc.getElementsByClassName("value-info-list")(1).Children(2).innerText)
'MsgBox (cclass)

 Dim hclass As Variant
 hclass = Split(cclass, " ")
 Range("Delinquent_On_Mortgage").Value = hclass(0)


'Listings with price cut

 cclass = Trim(Doc.getElementsByClassName("value-info-list")(2).Children(2).innerText)
'MsgBox (cclass)

 Dim iclass As Variant
 iclass = Split(cclass, " ")
 Range("Price_Cut").Value = iclass(0)


'Breakeven Horizon

 cclass = Trim(Doc.getElementsByClassName("value-info-list")(3).Children(2).innerText)
'MsgBox (cclass)

 Dim jclass As Variant
 jclass = Split(cclass, " ")
 Range("Breakeven").Value = jclass(0)

'Rent List Price

cclass = Trim(Doc.getElementsByClassName("value-info-list")(3).Children(3).innerText)
'MsgBox (cclass)

Dim kclass As Variant
kclass = Split(cclass, " ")
Range("Rent_List_Price").Value = kclass(0)

'Rent List Price/sq ft

 cclass = Trim(Doc.getElementsByClassName("value-info-list")(3).Children(4).innerText)
 'MsgBox (cclass)

 Dim lclass As Variant
 lclass = Split(cclass, " ")
 Range("Rent_sq").Value = lclass(0)

'close the browser
 objIE.Quit




Set ws = ThisWorkbook.Worksheets("Engine")

'initiating a new instance of Internet Explorer and asigning it to objIE
Set objIE = New InternetExplorer

'make IE browser visible (False would allow IE to run in the background)
'objIE.Visible = True

'navigate IE to this web page (a pretty neat search engine really)
objIE.navigate "https://datausa.io/profile/geo/" & ws.Range("City_Search").Value & "-" & ws.Range("State_Search").Value

'wait here a few seconds while the browser is busy
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop

 Set Doc = objIE.document
 Dim Data As String
 Data = Trim(Doc.getElementsByClassName("stat")(0).Children(1).innerText)
'MsgBox (Data)

 Dim adata As Variant
 adata = Split(Data, "")
 ws.Range("Population").Value = adata(0)

  End Sub




  'exit our SearchBot subroutine

如果有人可以帮助我,将不胜感激。我将继续解决问题,看看我是否可以让它工作。如果您对公式有任何疑问,请询问。

谢谢

标签: htmlvbaexcelweb-scraping

解决方案


关于您的代码的注释:

  1. 我无法复制您的错误。有时您可能会遇到验证码请求。您可以通过设置来检查这一点objIE.Visible = True
  2. 您正在打开一个 IE 实例,退出它,然后重新打开。您可以继续使用现有的并导航到新的 URL。这将减少代码量。
  3. 您有未声明的变量,这意味着您没有Option Explicit在代码顶部使用。
  4. 有点不清楚您打算在某些时候使用哪些工作表,因为您没有完全限定范围与它们的父工作表对象。因此我的假设如下。当您不符合条件时,范围对象将使用 Activesheet。
  5. 您的人口统计数据会返回下降百分比以及实际人口数据 - 这是您将许多其他结果拆分以获得子字符串时的意图吗?
  6. 通过以valueeg类为目标doc.getElementsByClassName("value"),您可以遍历返回的集合并完全避免使用Split并大大减少代码量。
  7. 您可以关闭屏幕更新以获得更快的结果。
  8. 根据您执行此操作的频率,您可以切换到 XHR,如下所示,这要快得多。

我的代码:

我不得不做出一些假设,但以下内容从页面中获取信息。我假设所有信息都来自和去往,sheet2除了Population. 我在下面的屏幕截图中显示它,以便您可以一起查看所有结果。


XHR 和提琴手

我曾经fiddler在进行选择并按下搜索按钮时检查网络流量。这表明我提出了一个GET请求,我使用fiddler的检查员提供的信息来制定正确的GET请求。

提琴手结果:

提琴手信息

请注意,如果您在太短的时间内尝试太多GET请求,您最终会得到验证码。


CSS 选择器:

检查检索到的页面的 HTML,我可以看到所有相关值的 className 为value

班级名称

我可以通过使用.valuewhere"."表示类的 CSS 选择器来定位这些元素。

匹配元素示例:

CSS 查询

由于匹配的元素很多,我使用 的.querySelectorAll方法document来检索NodeList包含所有匹配项的 a。我遍历.LengthNodeList访问所需的值。我使用Select Case通过索引位置来确定将值写入哪个命名范围。您可能需要验证我是否正确。


VBA:

Option Explicit
Public Sub GetInfo()
    Dim html As New MSHTML.HTMLDocument, ws As Worksheet
    Application.ScreenUpdating = False
    Set ws = ThisWorkbook.Worksheets("Sheet2")   '<== Must be sheet name where named ranges are
    Dim searchString As String, URL As String
    searchString = Replace$(LCase$(ws.Range("B3")), Chr$(32), Chr$(45)) & Chr$(45) & LCase$(ws.Range("B4")) 'Santa Ana in B3, CA in B4
    URL = "https://www.zillow.com/" & searchString & "/home-values/"

    html.body.innerHTML = GetHTML(URL)

    Dim aNodeList As Object, i As Long
    Set aNodeList = html.querySelectorAll(".value")
    With ws
        For i = 0 To aNodeList.Length - 1
            Select Case i
            Case 0 ' ZHVI
                .Range("Market_Price") = aNodeList.item(i).innerText
            Case 1                               ' 1-yr forecast
                .Range("yr_forecast") = aNodeList.item(i).innerText
            Case 2                               'Median listing price
                .Range("Median_List_Price") = aNodeList.item(i).innerText
            Case 3                               'Median sale price
                .Range("Median_Sale_Price") = aNodeList.item(i).innerText
            Case 4                               'Market Health Index
                .Range("Healthy") = aNodeList.item(i).innerText
            Case 5                               'Homes with negative equity
                .Range("Home_With_Negative_Equity") = aNodeList.item(i).innerText
            Case 6                               'Delinquent on mortgage
                .Range("Delinquent_On_Mortgage") = aNodeList.item(i).innerText
            Case 7                               'Median list price / sq ft
            Case 8                               'Median sale price / sq ft
            Case 9
            Case 10                              'Value Listings with price cut
                .Range("Price_Cut") = aNodeList.item(i).innerText
            Case 11                              'Breakeven horizon
                .Range("Breakeven") = aNodeList.item(i).innerText
            Case 12
                .Range("Rent_List_Price") = aNodeList.item(i).innerText ' Rent list price
            Case 13                              'List price / sq ft
                .Range("Rent_sq") = aNodeList.item(i).innerText
            End Select
        Next i

        URL = "https://datausa.io/profile/geo/" & searchString
        html.body.innerHTML = GetHTML(URL)
        ThisWorkbook.Worksheets("Engine").Range("Population") = html.querySelector(".stat-value").innerText
    End With

    Application.ScreenUpdating = True
End Sub
Public Function GetHTML(ByVal URL As String) As String
    Dim sResponse As String
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", URL, False
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
    End With
    GetHTML = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
End Function

工作表2中的结果:

结果


所需参考资料:

HTML 对象库


推荐阅读