excel - 如何使用 xml 查询从以下网站获取数据
问题描述
您好我想从以下两个网站获取专利号和摘要数据:
我知道如何使用 HTML 查询从这些网站上抓取数据,我想知道是否有办法使用 XML 查询获取数据。
Sub google()
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
Dim pageText, pageclaim As String
Dim HTMLTable, HTMLp As MSHTML.IHTMLElement
Dim HTMLTables, HTMLps As MSHTML.IHTMLElementCollection
Dim HTMLRow As MSHTML.IHTMLElement
Dim HTMLCell As MSHTML.IHTMLElement
Dim RowNum As Long, ColNum As Integer
Dim pointer As Integer
IE.Visible = True
IE.navigate ""
Do While IE.readyState <> READYSTATE_COMPLETE
Loop
Set HTMLDoc = IE.Document
End sub
谢谢您的帮助
解决方案
编辑
我又想了想,记得你也可以交出 UserAgent。所以你可以得到谷歌链接页面的HTML源代码:
Sub google()
Dim http As New MSXML2.XMLHTTP60
Dim htmlDoc As New MSHTML.HTMLDocument
Dim url As String
url = "https://patents.google.com/patent/US8805587B1/en?oq=US8805587B1"
http.Open "GET", url, False
http.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"
http.Send
htmlDoc.body.innerHTML = http.responseText
Close
Open "D:\httpRequest.txt" For Output As #1
Print #1, htmlDoc.body.outerHTML
Close
'Debug.Print htmlDoc.body.outerHTML
End Sub
第一次发帖(第二个链接的部分仍然有效)
~~坏消息。第一个页面不适用于 XML 请求:~~
Sub google()
Dim http As New MSXML2.XMLHTTP60
Dim htmlDoc As New MSHTML.HTMLDocument
Dim url As String
url = "https://patents.google.com/patent/US8805587B1/en?oq=US8805587B1"
http.Open "GET", url, False
http.Send
htmlDoc.body.innerHTML = http.responseText
Debug.Print htmlDoc.body.outerHTML
End Sub
这是结果:
<BODY>
<DIV style="MAX-WIDTH: 590px; MARGIN: 64px auto 0px">
<H2>Your Browser Isn't Supported By Google Patents</H2>
<P>It looks like you're using an old browser which isn't supported by Google Patents. To use Google Patents, you'll need an up-to-date browser.
<A href="https://support.google.com/faqs/answer/6261372">Learn more</A>.
</P>
</DIV>
</BODY>
第二个页面不适用于 XML 请求,因为它是一个动态内容页面。XML 请求只能使用静态 HTML,这意味着来自 URL 调用的第一个交付的 HTML。
推荐阅读
- python - pip 拒绝安装任何东西
- android - Android Mockito.verify 说参数不同!打印相同的内容
- adobe - Aviary API / Creative Cloud SDK 图像编辑器的免费替换?
- java - 需要在Java中提取字符串中遵循模式的值
- java - 如何找到具有相同长度的 2 个浮点向量之间的相似度百分比?
- ruby-on-rails - 在heroku上运行passenger_native_support.so
- php - 为什么有些 php if 条件写成 if(1>2)
- floating-point - 更改精度时数字的正确形式
- azure - Azure 上的多个站点到站点 VPN 连接
- regex - Regexp_Extract 分隔字符串中的第 n 个位置