首页 > 解决方案 > 从 Google 搜索中获取图片链接

问题描述

我正在尝试通过谷歌从搜索链接中获取图片链接,这是我的尝试

Sub Test()
Const sURL As String = "https://www.google.com.eg/search?q=baby&sxsrf=ALeKk01tyfvvxyYjaC0YctjxaY0RlvPnuw:1586804351129&source=lnms&tbm=isch&sa=X&ved=2ahUKEwjB77TtiuboAhUl5uAKHR5KA2wQ_AUoAXoECBQQAw&biw=1280&bih=881"
Dim http As MSXML2.XMLHTTP60, html As MSHTML.HTMLDocument

Set http = New MSXML2.XMLHTTP60
Set html = New MSHTML.HTMLDocument

With http
    .Open "Get", sURL, False
    .send
    html.body.innerHTML = .responseText
End With

Dim post As Object, i As Long

Set post = html.querySelectorAll(".mM5pbd .bRMDJf")

For i = 0 To post.Length - 1
    Debug.Print post.Item(i).innerHTML
Next i

Stop
End Sub

首先我得到了帖子。长度只有 20,而我预计大约 300 秒我无法获得图片的正确链接,因为它似乎是 base64 加密或类似的东西(我不确定)我怎样才能获得真正的链接图片并获取所有相关图片的所有链接?

我认为解决了一点

    Set post = html.querySelectorAll("a.VFACy.kGQAp")

For i = 0 To post.Length - 1
    Debug.Print post.Item(i).href
Next i

但是如何获取所有链接而不是仅获取 20 个链接?** 链接不完全正确,例如我得到了这个链接

https://www.fool.com/taxes/2018/03/27/are-you-having-a-baby-here-are-the-tax-breaks-you.aspx

虽然正确的链接是

https://g.foolcdn.com/editorial/images/466737/new-parents-holding-newborn-baby-mom-dad-father-mother.jpg

** 我尝试使用 IE

Sub TestIE()
Dim ie As New InternetExplorer
Dim lastrow As Long
Dim i As Long
Dim j As Long
lastrow = Range("A" & Rows.Count).End(xlUp).Row

For i = 2 To lastrow
    ie.Visible = True
    ie.navigate "https://www.google.com.eg/search?q=baby&sxsrf=ALeKk01tyfvvxyYjaC0YctjxaY0RlvPnuw:1586804351129&source=lnms&tbm=isch&sa=X&ved=2ahUKEwjB77TtiuboAhUl5uAKHR5KA2wQ_AUoAXoECBQQAw&biw=1280&bih=881"
    While ie.Busy Or ie.readyState < 4: DoEvents: Wend

    'querySelectorAll("a.VFACy.kGQAp")
    Dim post As Object

    Set post = ie.document.querySelectorAll("a.VFACy.kGQAp")

    For j = 0 To post.Length - 1
        Debug.Print post.Item(i).innerHTML
    Next j
Next
End Sub

但在结果中,我得到了所有相同的 innerhtml

<div class="sMi44c lNHeqe"><div class="WGvvNb" dir="ltr">Baby colic - Wikipedia</div><div class="fxgdke"><span dir="ltr">en.wikipedia.org</span></div> 
</div>

处理 IE 时使用 QuerySelectorAll 是否不同?

** 再试一次

        Dim post As Object

    Set post = ie.document.querySelectorAll(".bRMDJf img")
    Dim r As Long

    For j = 0 To post.Length - 1
    r = r + 1
        Cells(r, 1).Value = post.Item(i).getAttribute("src")
    Next j

现在我得到了 100 个但没有链接,图片是 base64 加密,而且我发现所有图片的输出都是一样的。我可以解密图片,但质量很低.. 我只有 100 .. 我怎样才能增加结果的数量并获得正确的链接?

在此处输入图像描述

标签: excelvbaweb-scraping

解决方案


推荐阅读