vba - 在抓取维基百科 URL 时避免 Google IP 阻塞
问题描述
对于我的硕士论文,我需要获取演员列表(大约 20,000 个)的 Wikipedia-URL,sktneer 帮助我第一次尝试运行代码。再次感谢你!(请参阅:从 Excel 列表中获取 Wikipedia 页面 url )
剩下的一个问题是,谷歌在几个演员之后阻止了我的查询。(150-200) 一个想法是在代码中构建 Application.Wait 命令,以便在每次新查询之前会有 2-3 秒的暂停。
这行得通吗,如果你能帮我把它嵌入到代码中吗?
或者这是错误的方法,还有更简单的解决方案吗?
代码:
Sub XMLHTTP()
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object
Dim start_time As Date
Dim end_time As Date
Dim i As Long
Dim str_text As String
lastRow = Range("A" & Rows.Count).End(xlUp).Row
Dim cookie As String
Dim result_cookie As String
start_time = Time
Debug.Print "start_time:" & start_time
For i = 2 To lastRow
url = "https://www.google.de/search?q=" & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
If XMLHTTP.Status = 200 Then
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
Set objResultDiv = html.getelementbyid("rso")
If Not objResultDiv Is Nothing Then
Set objH3 = objResultDiv.getelementsbytagname("H3")(0)
Set link = objH3.getelementsbytagname("a")(0)
str_text = Replace(link.innerHTML, "<EM>", "")
str_text = Replace(str_text, "</EM>", "")
Cells(i, 2) = str_text
Cells(i, 3) = link.href
DoEvents
Else
Cells(i, 2) = "Not Found"
Cells(i, 3) = "Not Found"
End If
Else
Cells(i, 2) = "Not Found"
Cells(i, 3) = "Not Found"
End If
Next
end_time = Time
Debug.Print "end_time:" & end_time
Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub
解决方案
推荐阅读
- oracle - excel年龄分析
- performance - tar - -to-stdout 如何恢复stdout的内容并将其保存到带有原始目录的文件系统
- python-2.7 - 在Python中的if-else条件之后从两个列表中添加和减去
- sql - 在 Apex Oracle SQL 中创建表/序列/触发器 - ORA-00922 / ORA-00907 / ORA-00922
- php - 如何通过 2 个日期制作搜索栏?laravel 的开始日期和结束日期
- mysql - 使用三表mysql连接查询
- java - 从远程服务获取数据哪种方式更好
- android - Ionic Android 设备测试失败
- php - 从一个文件夹复制,解密然后使用 PHP 解压缩
- android - 如何从 XML 文件运行动画?