首页 > 解决方案 > 无法从网页中收集不同属性的链接

问题描述

我在 vba 中编写了一个脚本,以仅从Single Family Homes网页的右侧区域获取标题下不同属性的链接。当我运行我的脚本时,我什么也没得到,也没有错误。我希望抓取的内容是静态的并且在页面源代码中可用,所以XMLHttpRequest应该这样做。尽管我在脚本中定义的选择器似乎没有错误,但我仍然无法获取不同属性的链接。

网页地址

我写过:

Sub GetLinks()
    Const link$ = "https://www.zillow.com/homes/for_sale/33125/house_type/12_zm/0_mmm/"
    Dim oHttp As New XMLHTTP60, Html As New HTMLDocument
    Dim I&

    With oHttp
        .Open "GET", link, False
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .send
        Html.body.innerHTML = .responseText
        With Html.querySelectorAll("article > a.list-card-info")
            For I = 0 To .Length - 1
                Sheet1.Range("A1").Offset(I, 0) = .item(I).getAttribute("href")
            Next I
        End With
    End With
End Sub

预期的链接如下:

https://www.zillow.com/homedetails/3446-NW-15th-St-Miami-FL-33125/43822210_zpid/
https://www.zillow.com/homedetails/1877-NW-22nd-Ave-Miami-FL-33125/43823838_zpid/
https://www.zillow.com/homedetails/1605-NW-8th-Ter-Miami-FL-33125/43825765_zpid/

如何从上面的链接从它的登录页面获取不同属性的所有链接?

标签: vbaweb-scrapingqueryselector

解决方案


单独使用孩子的班级。请注意,我还想更改有关代码的许多其他内容,但知道您喜欢保持结构/样式。

Sub GetLinks()
    Const link$ = "https://www.zillow.com/homes/for_sale/33125/house_type/12_zm/0_mmm/"
    Dim oHttp As New XMLHTTP60, Html As New HTMLDocument
    Dim I&

    With oHttp
        .Open "GET", link, False
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .send
        Html.body.innerHTML = .responseText

        With Html.querySelectorAll(".list-card-info")
            For I = 0 To .Length - 1
                Sheet1.Range("A1").Offset(I, 0) = .item(I).getAttribute("href")
            Next I
        End With
    End With
End Sub

我可能会做出的一些改变:

Private Sub GetLinks()
    Const LINK As String = "https://www.zillow.com/homes/for_sale/33125/house_type/12_zm/0_mmm/"
    Dim http As MSXML2.XMLHTTP60, html As MSHTML.HTMLDocument
    Dim i As Long, links As Object

    Set http = New MSXML2.XMLHTTP60: Set html = New MSHTML.HTMLDocument

    With http
        .Open "GET", LINK, False
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .send
        html.body.innerHTML = .responseText
    End With

    Set links = html.querySelectorAll(".list-card-info")

    With ThisWorkbook.Worksheets("Sheet1")
        For i = 0 To links.Length - 1
            .Cells(i + 1, 1) = links.item(i).href
        Next i
    End With
End Sub

推荐阅读