首页 > 解决方案 > 从 HttpWebRequest 下载时缺少某些元素?

问题描述

我正在使用 httpwebrequest 从给定的 url 下载数据,但很少有元素没有响应。

   Dim Request As HttpWebRequest = CType(WebRequest.Create("https://www.royalmail.com/track-your-item#/tracking-results/37005067200003B0F1FF2"), HttpWebRequest)
    Request.Timeout = 2 * 60 * 1000
    Request.Proxy = Nothing
    Request.AutomaticDecompression = DecompressionMethods.Deflate Or DecompressionMethods.GZip
    Request.Credentials = System.Net.CredentialCache.DefaultCredentials
    Dim HttpResp As HttpWebResponse
    HttpResp = (CType(Request.GetResponse(), HttpWebResponse))
    If HttpResp.StatusCode = HttpStatusCode.OK Then
        Dim receiveStream As Stream = HttpResp.GetResponseStream()
        Dim readStream As New StreamReader(receiveStream)
        Dim sData As String
        sData = readStream.ReadToEnd()
        readStream.Close()

    Else

    End If    

当我在 chrome 上打开 URL(https://www.royalmail.com/track-your-item#/tracking-results/37005067200003B0F1FF2)并检查元素时,我可以看到这个元素(搜索 37005067200003B0F1FF2)但作为回应我我没有得到这个元素(搜索 37005067200003B0F1FF2)。

使用 webbrowser 控件的代码

Private Sub Button10_Click(sender As Object, e As EventArgs) Handles Button10.Click


    ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12
    Dim sURL As String = String.Format("https://www.royalmail.com/track-your-item#/tracking-results/37005067200003B0F1FF2")
    Dim webBrowserForPrinting As WebBrowser = New WebBrowser()
    webBrowserForPrinting.ScriptErrorsSuppressed = True
    AddHandler webBrowserForPrinting.DocumentCompleted, AddressOf PrintDocument
    webBrowserForPrinting.Url = New Uri(sURL)
    webBrowserForPrinting.Navigate(sURL)

End Sub
Private Sub PrintDocument(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs)
    Dim HTMD As HtmlDocument
    HTMD = CType(sender, WebBrowser).Document



    Dim HTC As HtmlElementCollection
    If HTMD IsNot Nothing Then
        HTC = HTMD.All
        For Each ele As HtmlElement In HTC
            MsgBox(ele.InnerHtml)

        Next
    End If

End Sub

标签: htmlvb.netiframewebbrowser-control

解决方案


您需要激活 WebBrowser 高级功能才能成功完成解析过程。如果未启用这些功能,则标准 IE7 仿真中的 WebBrowser 将无法完成文档。失败是由大量脚本错误引起的。

我添加了一个带有静态方法 ( WebBrowserAdvancedFetures) 的类,以将所需的值添加到注册表中。
WebBrowserAdvancedFetures.ActivateWBAdvancedFeatures在 Form 的构造函数中调用。
你可以回滚调用WebBrowserAdvancedFetures.DeactivateWBAdvancedFeatures

此程序如何工作:

  1. 实例化一个WebBrowser类 ( Private browser As WebBrowser)。我们也可以使用 WebBrowser 控件(Form 容器可以托管的可见控件版本),它是一样的。
  2. 订阅其DocumentCompleted事件。每次HtmlDocuments内部主线之一WebBrowser.Document完成时,都会升起。阅读 如何在 Frames/IFrames 中获取 HtmlElement 值?有关 HtmlDocuments 嵌套的更多详细信息。
  3. DocumentCompleted处理程序中,验证至少有一个文档已准备好进行解析,并检查WebBrowser.ReadyState = WebBrowserReadyState.Complete
  4. 如果是,请搜索包含我们要查找的数据的 HtmlElements。
  5. 收集完所有数据后,引发一个事件,通知解析完成(如果需要,这也允许其他类的订阅者也收到通知。EventArgs不过,这需要一个自定义类)并禁用对HtmlDocument(在这里,这是通过设置一个布尔字段来完成的)。
  6. 处理新数据(这里只有 aString和 aDateTime对象),然后重置解析过程中使用的字段/变量。

Form.FormClosed请记住在事件或自定义类Dispose()方法中删除处理程序:

RemoveHandler DocumentParsingComplete, AddressOf OnDocumentParsingComplete
RemoveHandler browser.DocumentCompleted, AddressOf browser_DocumentCompleted

Public Event DocumentParsingComplete As EventHandler(Of EventArgs)

Private browser As WebBrowser = Nothing
Private trackingNumberValue As String = String.Empty
Private trackingDateValue As DateTime
Private documentParsed As Boolean = False
Private userAgent As String = "User-Agent: Mozilla/5.0 (Windows NT 10; Win64; x64; rv:48.0) Gecko/20100101 Firefox/48.0"

Public Sub New()
    InitializeComponent()
    WebBrowserAdvancedFetures.ActivateWBAdvancedFeatures(Path.GetFileName(Application.ExecutablePath))
    browser = New WebBrowser With {.ScriptErrorsSuppressed = True}
    AddHandler DocumentParsingComplete, AddressOf OnDocumentParsingComplete
    AddHandler browser.DocumentCompleted, AddressOf browser_DocumentCompleted
End Sub

Private Sub btnNavigate_Click(sender As Object, e As EventArgs) Handles btnNavigate.Click
    browser.Navigate("")
    browser.Document.OpenNew(True)
    documentParsed = False
    browser.Navigate("[Some URL]", "_self", Nothing, userAgent)
End Sub

Private Sub OnDocumentParsingComplete(sender As Object, e As EventArgs)
    ' Do whatever you need with these
    Console.WriteLine(trackingNumberValue)
    Console.WriteLine(trackingDateValue)

    'Then reset for further use
    trackingNumberValue = String.Empty
    trackingDateValue = DateTime.MinValue
End Sub

Private Sub browser_DocumentCompleted(sender As Object, e As WebBrowserDocumentCompletedEventArgs)
    Dim wb As WebBrowser = DirectCast(sender, WebBrowser)
    If wb.ReadyState <> WebBrowserReadyState.Complete OrElse wb.Document.Forms.Count = 0 OrElse documentParsed Then Return

    Dim trackingNumberClass As String = "tracking-number-value"
    Dim trackingElement = wb.Document.GetElementsByTagName("SPAN").
        OfType(Of HtmlElement)().FirstOrDefault(Function(elm) elm.GetAttribute("className").Contains(trackingNumberClass))
    Me.trackingNumberValue = trackingElement?.InnerText

    Dim trackingDateClass As String = "ng-binding ng-scope"
    Dim trackingDateElement = wb.Document.GetElementsByTagName("SPAN").
        OfType(Of HtmlElement)().FirstOrDefault(Function(elm) elm.GetAttribute("className").Equals(trackingDateClass))

    If trackingDateElement IsNot Nothing Then
        Dim deliveryDate As String = trackingDateElement.InnerText.Split().Last().TrimEnd("."c)
        Me.trackingDateValue = Date.ParseExact(deliveryDate, "dd-MM-yyyy", Nothing)
        If Not String.IsNullOrEmpty(trackingNumberValue) Then
            documentParsed = True
            RaiseEvent DocumentParsingComplete(sender, EventArgs.Empty)
        End If
    End If
End Sub

使用此类激活/停用 WebBrowser 控件的高级功能:

Imports Microsoft.Win32
Imports System.Security.AccessControl

Public Class WebBrowserAdvancedFetures
    Private Shared baseKeyName As String = "Software\Microsoft\Internet Explorer\Main\FeatureControl"
    Private Shared featuresKey As String = baseKeyName & "\FEATURE_BROWSER_EMULATION"
    Private Shared hardwareAccelKey As String = baseKeyName & "\FEATURE_GPU_RENDERING"

    Public Shared Sub ActivateWBAdvancedFeatures(executableName As String)
        Dim wbFeatureKey As RegistryKey = Nothing
        Dim wbAccelKey As RegistryKey = Nothing

        Try
            wbFeatureKey = Registry.CurrentUser.OpenSubKey(featuresKey, 
                RegistryKeyPermissionCheck.ReadWriteSubTree, RegistryRights.WriteKey)
            If wbFeatureKey Is Nothing Then
                wbFeatureKey = Registry.CurrentUser.CreateSubKey(featuresKey, True)
            End If
            wbFeatureKey.SetValue(executableName, 11001, RegistryValueKind.DWord)

            wbAccelKey = Registry.CurrentUser.OpenSubKey(hardwareAccelKey, 
                RegistryKeyPermissionCheck.ReadWriteSubTree, RegistryRights.WriteKey)
            If wbAccelKey Is Nothing Then
                wbAccelKey = Registry.CurrentUser.CreateSubKey(hardwareAccelKey, True)
            End If
            wbAccelKey.SetValue(executableName, 1, RegistryValueKind.DWord)
        Finally
            wbFeatureKey?.Dispose()
            wbAccelKey?.Dispose()
        End Try
    End Sub

    Public Shared Sub DeactivateWBAdvancedFeatures(executableName As String)
        Using wbFeatureKey = Registry.CurrentUser.OpenSubKey(
            featuresKey, RegistryKeyPermissionCheck.ReadWriteSubTree, RegistryRights.WriteKey)
            wbFeatureKey.DeleteValue(executableName, False)
        End Using

        Using wbAccelKey = Registry.CurrentUser.OpenSubKey(
            hardwareAccelKey, RegistryKeyPermissionCheck.ReadWriteSubTree, RegistryRights.WriteKey)
            wbAccelKey.DeleteValue(executableName, False)
        End Using
    End Sub
End Class

推荐阅读