首页 > 解决方案 > 使用 vba 抓取多页网页表

问题描述

我正在尝试使用 vba 将表格的第二页保存在 excel 中,但我无法使用 click 属性,请问您可以帮帮我吗?我在网上到处搜索都没有结果。谢谢。

Sub BrowseSiteTableObjectX()
    Dim IE As New SHDocVw.InternetExplorer
    Dim Docm As MSHTML.HTMLDocument
    Dim HTMLAtab As MSHTML.IHTMLElement
    Dim HTMLArow As MSHTML.IHTMLElement
    Dim iRow As Long

    With IE
        .navigate "https://www.nasdaq.com/market-activity/stocks/screener"
        Do While .Busy Or .readyState <> 4
           DoEvents
        Loop
    End With

    Set Docm = IE.document

    Docm.getElementsByClassName("symbol-screener__pagination")(0).getElementsByClassName("next")(0).Click

    Set Docm = IE.document

    Set HTMLAtab = Docm.getElementsByClassName("symbol-screener__table")(0)

    For Each HTMLArow In HTMLAtab.getElementsByClassName("symbol-screener__row")
        iRow = iRow + 1
        Cells(iRow, 1) = HTMLArow.getElementsByClassName("symbol-screener__cell symbol-screener__cell--ticker")(0).innerText
        Cells(iRow, 2) = HTMLArow.getElementsByClassName("symbol-screener__cell symbol-screener__cell--company")(0).innerText
        DoEvents
    Next HTMLArow

    IE.Quit
    Set IE = Nothing
    Set Docm = Nothing
End Sub

标签: htmlexcelvba

解决方案


分页有时是一件棘手的事情,但在这个页面上它很容易。我还修复了一些其他问题。请阅读代码中的注释:

Sub BrowseSiteTableObjectX()
  Dim IE As New SHDocVw.InternetExplorer
  Dim Docm As MSHTML.HTMLDocument
  Dim HTMLAtab As MSHTML.IHTMLElement
  Dim HTMLArow As MSHTML.IHTMLElement
  Dim nodePagiantionNext As Object 'I do those things always by late binding
  Dim iRow As Long
  Dim lastPage As Boolean

  With IE
    'Set the following line to 'False' to make IE invisible
    'You can also set IE to full screen, scroll to the page
    'count and watch it advance. I give each page 5 seconds
    'to load. From what I have seen, this is partly necessary
    .Visible = True
    .navigate "https://www.nasdaq.com/market-activity/stocks/screener"
    Do While .Busy Or .readyState <> 4: DoEvents: Loop
  End With
  'The page loads data after the IE says he's ready. So you need a manual break for a few seconds
  'Application.Wait (Now + TimeSerial(pause_hours, pause_minutes, pause_seconds))
  Application.Wait (Now + TimeSerial(0, 0, 5))

  Set Docm = IE.document

  'You need a loop to go through all pages
  '(The IE is a diva. It can be you must start him every loop round. But for the given url it
  'works for 312 pages with the 5 second break)
  Do
    'If you click the 'next' link here, you are on the second page before you read out any data
    'You must do the click after reading data from the first page
    '
    'Give some seconds after the click to load the new page
    Application.Wait (Now + TimeSerial(0, 0, 5))

    Set Docm = IE.document

    Set HTMLAtab = Docm.getElementsByClassName("symbol-screener__table")(0)

    For Each HTMLArow In HTMLAtab.getElementsByClassName("symbol-screener__row")
      iRow = iRow + 1
      Cells(iRow, 1) = HTMLArow.getElementsByClassName("symbol-screener__cell symbol-screener__cell--ticker")(0).innerText
      Cells(iRow, 2) = HTMLArow.getElementsByClassName("symbol-screener__cell symbol-screener__cell--company")(0).innerText
      'DoEvents 'Why?
    Next HTMLArow

    'You can't click the li tag. You must click the link which is the first child of the li tag
    'But you must also know when the last page is reached. Thats  when the CSS class changes to "next disabled"
    Set nodePagiantionNext = Docm.getElementsByClassName("symbol-screener__pagination")(0).getElementsByClassName("next")(0)
    '
    'Check if the CSS class has been changed to "disabled".
    'Short explanation, because we ask for "next" first, and if this should work,
    '"next" must also match "next disabled". This is true.  "next" is the first
    'part of "next disabled". All CSS class names with the same beginning fit for
    'a node collection to be created
    If nodePagiantionNext.getAttribute("class") = "next disabled" Then
      'If last page end loop
      lastPage = True
    Else
      'If not the last page, click for next page
      nodePagiantionNext.FirstChild.Click
    End If
  Loop Until lastPage

  IE.Quit
  Set IE = Nothing
  Set Docm = Nothing
End Sub

推荐阅读