首页 > 解决方案 > 带有 Excel Internet Explorer 的 vba 不使用新打开的选项卡来获取 getelemtsbytagname

问题描述

我在 Microsoft Excel 中创建了 VBA 代码。我想要代码做的是:

  1. 去一个网站。
  2. 点击网站上的链接。
  3. 转到新打开的选项卡,然后单击新选项卡上的下载。
  4. 然后回到根/第一页并单击下一步并单击下载。
  5. 重复直到所有的根页面链接都被点击和下载。

我有下面的代码,那种作品。它使用 Internet Explorer,因为该网站不适用于 google chrome,但如果效果更好,我会尝试。

代码确实导航,确实打开了一个选项卡......这段代码没有做的是开始使用新页面/选项卡中的数据,所有元素都来自根页面。我尝试制作一个新的 IE 实例,选择 by item(1)。我不确定我需要做什么才能instancehyperlinks引用新创建的选项卡。仅供参考,由于根站点的编写方式,数据出现在新选项卡中,我无法控制任何 HTML。

这是代码:

Sub getalllinks()

    Dim ie As Object

    Set ie = CreateObject("InternetExplorer.Application")
    ie.Visible = True
    url_name = "123.123.123.123"
    ie.navigate url_name
    Do
        DoEvents
    Loop Until ie.readystate = 4 ' wait until fully loaded
    Set allhyperlinks = ie.document.getelementsbytagname("A")
    For Each hyper_link In allhyperlinks
        If hyper_link.Title = "View Subject" Then
            hyper_link.Click
            Do
                DoEvents
            Loop Until ie.readystate = 4 ' wait until fully loaded
            Set instancehyperlinks = ie.document.getelementsbytagname("A")
            For Each hyper_linkPage In instancehyperlinks
                If hyper_linkPage.Title = "Download" Then
                    hyper_linkPage.Click
                End If
            Next
        End If
    Next

End Sub

标签: vbaexcelinternet-explorerdomweb-scraping

解决方案


好的,我之前遇到过这个问题并解决了它,而无需使用任何第三方工具,例如 selenium我可能会考虑其他途径。

免责声明:我从其他来源找到并修改了很多此代码,由于许多原因,我现在无法追踪来源,如果我找到它们,我稍后会添加它们。

好的,首先您需要找到您的窗口,创建一个新模块并将其命名为“modWindowsAPI”并将其添加到其中,这将允许您的脚本挂钩到必要的 Windows API,不仅可以找到窗口,还可以下载:

Option Explicit

Public Declare Sub Sleep Lib "kernel32" _
    (ByVal dwMilliseconds As Long)

Public Declare Function FindWindow Lib "user32" Alias "FindWindowA" _
    (ByVal lpClassName As String, ByVal lpWindowName As String) As Long

Public Declare Function FindWindowEx Lib "user32" Alias "FindWindowExA" _
    (ByVal hWnd1 As Long, ByVal hWnd2 As Long, ByVal lpsz1 As String, ByVal lpsz2 As String) As Long

Public Declare Function SetForegroundWindow Lib "user32" _
    (ByVal hWnd As Long) As Long

Public Declare Function SendMessage Lib "user32" Alias "SendMessageA" _
    (ByVal hWnd As Long, ByVal wMsg As Long, ByVal wParam As Long, lParam As Any) As Long

Public Declare Function SendMessageByString Lib "user32" Alias "SendMessageA" _
    (ByVal hWnd As Long, ByVal wMsg As Long, ByVal wParam As Long, ByVal lParam As String) As Long

Public Declare Function PostMessage Lib "user32" Alias "PostMessageA" _
    (ByVal hWnd As Long, ByVal wMsg As Long, ByVal wParam As Long, lParam As Any) As Long

Public Declare Sub keybd_event Lib "user32" _
    (ByVal bVk As Byte, ByVal bScan As Byte, ByVal dwFlags As Long, ByVal dwExtraInfo As Long)

Public Const BM_CLICK = &HF5
Public Const WM_SETTEXT = &HC
Public Const WM_GETTEXT = &HD
Public Const WM_GETTEXTLENGTH = &HE

Public Const VK_KEYDOWN = &H0
Public Const VK_KEYUP = &H2
Public Const VK_CONTROL = &H11

从这里有两种锁定窗口的方法(据我所知,可能还有更多),通过 URL 或窗口标题找到它。

通过 URL 查找窗口: 创建函数 GetOpenIEByTitle:

Function GetOpenIEByTitle(i_Title As String, Optional ByVal i_ExactMatch As Boolean = True) As SHDocVw.InternetExplorer
    Dim windowMatched As Boolean
    Dim e_title As String
    windowMatched = False
    Dim windowTimeout As Integer
    windowTimeout = 0

    Do Until windowMatched = True Or windowTimeout = 10
        If i_ExactMatch = False Then i_Title = "*" & i_Title & "*"
        'ignore errors when accessing the document property
        On Error Resume Next
        'loop over all Shell-Windows
        For Each GetOpenIEByTitle In objShellWindows
            'if the document is of type HTMLDocument, it is an IE window
            If TypeName(GetOpenIEByTitle.Document) = "HTMLDocument" Then
            'check the title
                If GetOpenIEByTitle.Document.Title Like i_Title Then
                    'leave and set boolean as true, we found the right window
                    windowMatched = True
                    Sleep 600
                    Exit Function
                End If
            End If
        Next
    windowTimeout = windowTimeout + 1
    Loop
End Function

通过 URL 查找窗口: 创建名为 GetOpenIEByURL 的函数

Function GetOpenIEByURL(ByVal i_URL As String) As SHDocVw.InternetExplorer
    Dim urlMatched As Boolean
    urlMatched = False
    Dim urlTimeout As Integer
    urlTimeout = 0
    Do Until urlMatched = True Or urlTimeout = 30

        Dim objShellWindows As New SHDocVw.ShellWindows

        'ignore errors when accessing the document property
        On Error Resume Next
        'loop over all Shell-Windows
        For Each GetOpenIEByURL In objShellWindows
            'if the document is of type HTMLDocument, it is an IE window
            If TypeName(GetOpenIEByURL.Document) = "HTMLDocument" Then
            'check the URL
                If GetOpenIEByURL.Document.URL = i_URL Then
                    'leave, we found the right window
                    urlMatched = True
                    Exit Function
                End If
            End If
        Next

    urlTimeout = urlTimeout + 1
    Loop
End Function

总结 你在正确的道路上需要多个 IE 对象,每个活动窗口都需要它自己的对象,如果你关闭它并转到下一个对象,你可以重用同一个对象。

调用上述方法之一,如下所示:

Set ieAppChild = GetOpenIEByTitle("Some Title", False)
Set ieAppChild = GetOpenIEByURL("https://127.0.0.1")

编辑:忘记提及当您准备关闭 IE 窗口以移至下一个窗口时,不要忘记调用 ieAppChild.Quit 并且您不必在重用之前将 ie 子对象设置为空,但是,它不是最佳做法。

最后是找到下载窗口并点击保存的功能:

Function SaveAs()
    Dim hWnd As Long

    Dim timeout As Date

    'Debug.Print "File_Download_Click_Save"

    'Find the File Download window, waiting a maximum of 30 seconds for it to appear

    timeout = Now + TimeValue("00:00:30")

    Do
        hWnd = FindWindow("#32770", "File Download")
        DoEvents
        Sleep 200
    Loop Until hWnd Or Now > timeout

    'Debug.Print "   File Download window "; Hex(hWnd)
    If hWnd Then
        'Find the child Save button
        hWnd = FindWindowEx(hWnd, 0, "Button", "&Save")
        'Debug.Print "   Save button "; Hex(hWnd)
    End If


    If hWnd Then

        'Click the Save button
        SetForegroundWindow (hWnd)
        Sleep 600  'this sleep is required and 600 miiliseconds seems to be the minimum that works
        SendMessage hWnd, BM_CLICK, 0, 0
    End If
End Function

推荐阅读