html - 使用 vba 抓取多页网页表
问题描述
我正在尝试使用 vba 将表格的第二页保存在 excel 中,但我无法使用 click 属性,请问您可以帮帮我吗?我在网上到处搜索都没有结果。谢谢。
Sub BrowseSiteTableObjectX()
Dim IE As New SHDocVw.InternetExplorer
Dim Docm As MSHTML.HTMLDocument
Dim HTMLAtab As MSHTML.IHTMLElement
Dim HTMLArow As MSHTML.IHTMLElement
Dim iRow As Long
With IE
.navigate "https://www.nasdaq.com/market-activity/stocks/screener"
Do While .Busy Or .readyState <> 4
DoEvents
Loop
End With
Set Docm = IE.document
Docm.getElementsByClassName("symbol-screener__pagination")(0).getElementsByClassName("next")(0).Click
Set Docm = IE.document
Set HTMLAtab = Docm.getElementsByClassName("symbol-screener__table")(0)
For Each HTMLArow In HTMLAtab.getElementsByClassName("symbol-screener__row")
iRow = iRow + 1
Cells(iRow, 1) = HTMLArow.getElementsByClassName("symbol-screener__cell symbol-screener__cell--ticker")(0).innerText
Cells(iRow, 2) = HTMLArow.getElementsByClassName("symbol-screener__cell symbol-screener__cell--company")(0).innerText
DoEvents
Next HTMLArow
IE.Quit
Set IE = Nothing
Set Docm = Nothing
End Sub
解决方案
分页有时是一件棘手的事情,但在这个页面上它很容易。我还修复了一些其他问题。请阅读代码中的注释:
Sub BrowseSiteTableObjectX()
Dim IE As New SHDocVw.InternetExplorer
Dim Docm As MSHTML.HTMLDocument
Dim HTMLAtab As MSHTML.IHTMLElement
Dim HTMLArow As MSHTML.IHTMLElement
Dim nodePagiantionNext As Object 'I do those things always by late binding
Dim iRow As Long
Dim lastPage As Boolean
With IE
'Set the following line to 'False' to make IE invisible
'You can also set IE to full screen, scroll to the page
'count and watch it advance. I give each page 5 seconds
'to load. From what I have seen, this is partly necessary
.Visible = True
.navigate "https://www.nasdaq.com/market-activity/stocks/screener"
Do While .Busy Or .readyState <> 4: DoEvents: Loop
End With
'The page loads data after the IE says he's ready. So you need a manual break for a few seconds
'Application.Wait (Now + TimeSerial(pause_hours, pause_minutes, pause_seconds))
Application.Wait (Now + TimeSerial(0, 0, 5))
Set Docm = IE.document
'You need a loop to go through all pages
'(The IE is a diva. It can be you must start him every loop round. But for the given url it
'works for 312 pages with the 5 second break)
Do
'If you click the 'next' link here, you are on the second page before you read out any data
'You must do the click after reading data from the first page
'
'Give some seconds after the click to load the new page
Application.Wait (Now + TimeSerial(0, 0, 5))
Set Docm = IE.document
Set HTMLAtab = Docm.getElementsByClassName("symbol-screener__table")(0)
For Each HTMLArow In HTMLAtab.getElementsByClassName("symbol-screener__row")
iRow = iRow + 1
Cells(iRow, 1) = HTMLArow.getElementsByClassName("symbol-screener__cell symbol-screener__cell--ticker")(0).innerText
Cells(iRow, 2) = HTMLArow.getElementsByClassName("symbol-screener__cell symbol-screener__cell--company")(0).innerText
'DoEvents 'Why?
Next HTMLArow
'You can't click the li tag. You must click the link which is the first child of the li tag
'But you must also know when the last page is reached. Thats when the CSS class changes to "next disabled"
Set nodePagiantionNext = Docm.getElementsByClassName("symbol-screener__pagination")(0).getElementsByClassName("next")(0)
'
'Check if the CSS class has been changed to "disabled".
'Short explanation, because we ask for "next" first, and if this should work,
'"next" must also match "next disabled". This is true. "next" is the first
'part of "next disabled". All CSS class names with the same beginning fit for
'a node collection to be created
If nodePagiantionNext.getAttribute("class") = "next disabled" Then
'If last page end loop
lastPage = True
Else
'If not the last page, click for next page
nodePagiantionNext.FirstChild.Click
End If
Loop Until lastPage
IE.Quit
Set IE = Nothing
Set Docm = Nothing
End Sub
推荐阅读
- webgl - 添加 WebGL readPixels 调用可以更改着色器的输出吗?
- java - 无法使用 Spring Boot Security 登录
- sql-server - 使用带有 SQL Server 的 PyODBC 时无法应用烧瓶数据库迁移。错误:未提供 DSN 和 SERVER 关键字 (0)。通过 pyodbc.connect 连接有效
- vue.js - 如何将 DOM 发送到服务器中的图像(Vuejs)?
- javascript - 初始化前无法访问“X”
- javascript - 如何更改此代码,以便每次刷新浏览器时都不会删除
- apache-flink - Data Stream API 中的 Flink 迭代 - 缺点
- javascript - 查找数组中最接近的较小值
- javascript - 我的删除按钮功能有什么问题 - 反应
- python - Python tkinter 文本到边界问题