html - 从 HttpWebRequest 下载时缺少某些元素?
问题描述
我正在使用 httpwebrequest 从给定的 url 下载数据,但很少有元素没有响应。
Dim Request As HttpWebRequest = CType(WebRequest.Create("https://www.royalmail.com/track-your-item#/tracking-results/37005067200003B0F1FF2"), HttpWebRequest)
Request.Timeout = 2 * 60 * 1000
Request.Proxy = Nothing
Request.AutomaticDecompression = DecompressionMethods.Deflate Or DecompressionMethods.GZip
Request.Credentials = System.Net.CredentialCache.DefaultCredentials
Dim HttpResp As HttpWebResponse
HttpResp = (CType(Request.GetResponse(), HttpWebResponse))
If HttpResp.StatusCode = HttpStatusCode.OK Then
Dim receiveStream As Stream = HttpResp.GetResponseStream()
Dim readStream As New StreamReader(receiveStream)
Dim sData As String
sData = readStream.ReadToEnd()
readStream.Close()
Else
End If
当我在 chrome 上打开 URL(https://www.royalmail.com/track-your-item#/tracking-results/37005067200003B0F1FF2)并检查元素时,我可以看到这个元素(搜索 37005067200003B0F1FF2)但作为回应我我没有得到这个元素(搜索 37005067200003B0F1FF2)。
使用 webbrowser 控件的代码
Private Sub Button10_Click(sender As Object, e As EventArgs) Handles Button10.Click
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12
Dim sURL As String = String.Format("https://www.royalmail.com/track-your-item#/tracking-results/37005067200003B0F1FF2")
Dim webBrowserForPrinting As WebBrowser = New WebBrowser()
webBrowserForPrinting.ScriptErrorsSuppressed = True
AddHandler webBrowserForPrinting.DocumentCompleted, AddressOf PrintDocument
webBrowserForPrinting.Url = New Uri(sURL)
webBrowserForPrinting.Navigate(sURL)
End Sub
Private Sub PrintDocument(ByVal sender As Object, ByVal e As WebBrowserDocumentCompletedEventArgs)
Dim HTMD As HtmlDocument
HTMD = CType(sender, WebBrowser).Document
Dim HTC As HtmlElementCollection
If HTMD IsNot Nothing Then
HTC = HTMD.All
For Each ele As HtmlElement In HTC
MsgBox(ele.InnerHtml)
Next
End If
End Sub
解决方案
您需要激活 WebBrowser 高级功能才能成功完成解析过程。如果未启用这些功能,则标准 IE7 仿真中的 WebBrowser 将无法完成文档。失败是由大量脚本错误引起的。
我添加了一个带有静态方法 ( WebBrowserAdvancedFetures
) 的类,以将所需的值添加到注册表中。
WebBrowserAdvancedFetures.ActivateWBAdvancedFeatures
在 Form 的构造函数中调用。
你可以回滚调用WebBrowserAdvancedFetures.DeactivateWBAdvancedFeatures
。
此程序如何工作:
- 实例化一个WebBrowser类 (
Private browser As WebBrowser
)。我们也可以使用 WebBrowser 控件(Form 容器可以托管的可见控件版本),它是一样的。 - 订阅其DocumentCompleted事件。每次
HtmlDocuments
内部主线之一WebBrowser.Document
完成时,都会升起。阅读 如何在 Frames/IFrames 中获取 HtmlElement 值?有关 HtmlDocuments 嵌套的更多详细信息。 - 在
DocumentCompleted
处理程序中,验证至少有一个文档已准备好进行解析,并检查WebBrowser.ReadyState
= WebBrowserReadyState.Complete
- 如果是,请搜索包含我们要查找的数据的 HtmlElements。
- 收集完所有数据后,引发一个事件,通知解析完成(如果需要,这也允许其他类的订阅者也收到通知。
EventArgs
不过,这需要一个自定义类)并禁用对HtmlDocument
(在这里,这是通过设置一个布尔字段来完成的)。 - 处理新数据(这里只有 a
String
和 aDateTime
对象),然后重置解析过程中使用的字段/变量。
Form.FormClosed
请记住在事件或自定义类Dispose()
方法中删除处理程序:
RemoveHandler DocumentParsingComplete, AddressOf OnDocumentParsingComplete
RemoveHandler browser.DocumentCompleted, AddressOf browser_DocumentCompleted
Public Event DocumentParsingComplete As EventHandler(Of EventArgs)
Private browser As WebBrowser = Nothing
Private trackingNumberValue As String = String.Empty
Private trackingDateValue As DateTime
Private documentParsed As Boolean = False
Private userAgent As String = "User-Agent: Mozilla/5.0 (Windows NT 10; Win64; x64; rv:48.0) Gecko/20100101 Firefox/48.0"
Public Sub New()
InitializeComponent()
WebBrowserAdvancedFetures.ActivateWBAdvancedFeatures(Path.GetFileName(Application.ExecutablePath))
browser = New WebBrowser With {.ScriptErrorsSuppressed = True}
AddHandler DocumentParsingComplete, AddressOf OnDocumentParsingComplete
AddHandler browser.DocumentCompleted, AddressOf browser_DocumentCompleted
End Sub
Private Sub btnNavigate_Click(sender As Object, e As EventArgs) Handles btnNavigate.Click
browser.Navigate("")
browser.Document.OpenNew(True)
documentParsed = False
browser.Navigate("[Some URL]", "_self", Nothing, userAgent)
End Sub
Private Sub OnDocumentParsingComplete(sender As Object, e As EventArgs)
' Do whatever you need with these
Console.WriteLine(trackingNumberValue)
Console.WriteLine(trackingDateValue)
'Then reset for further use
trackingNumberValue = String.Empty
trackingDateValue = DateTime.MinValue
End Sub
Private Sub browser_DocumentCompleted(sender As Object, e As WebBrowserDocumentCompletedEventArgs)
Dim wb As WebBrowser = DirectCast(sender, WebBrowser)
If wb.ReadyState <> WebBrowserReadyState.Complete OrElse wb.Document.Forms.Count = 0 OrElse documentParsed Then Return
Dim trackingNumberClass As String = "tracking-number-value"
Dim trackingElement = wb.Document.GetElementsByTagName("SPAN").
OfType(Of HtmlElement)().FirstOrDefault(Function(elm) elm.GetAttribute("className").Contains(trackingNumberClass))
Me.trackingNumberValue = trackingElement?.InnerText
Dim trackingDateClass As String = "ng-binding ng-scope"
Dim trackingDateElement = wb.Document.GetElementsByTagName("SPAN").
OfType(Of HtmlElement)().FirstOrDefault(Function(elm) elm.GetAttribute("className").Equals(trackingDateClass))
If trackingDateElement IsNot Nothing Then
Dim deliveryDate As String = trackingDateElement.InnerText.Split().Last().TrimEnd("."c)
Me.trackingDateValue = Date.ParseExact(deliveryDate, "dd-MM-yyyy", Nothing)
If Not String.IsNullOrEmpty(trackingNumberValue) Then
documentParsed = True
RaiseEvent DocumentParsingComplete(sender, EventArgs.Empty)
End If
End If
End Sub
使用此类激活/停用 WebBrowser 控件的高级功能:
Imports Microsoft.Win32
Imports System.Security.AccessControl
Public Class WebBrowserAdvancedFetures
Private Shared baseKeyName As String = "Software\Microsoft\Internet Explorer\Main\FeatureControl"
Private Shared featuresKey As String = baseKeyName & "\FEATURE_BROWSER_EMULATION"
Private Shared hardwareAccelKey As String = baseKeyName & "\FEATURE_GPU_RENDERING"
Public Shared Sub ActivateWBAdvancedFeatures(executableName As String)
Dim wbFeatureKey As RegistryKey = Nothing
Dim wbAccelKey As RegistryKey = Nothing
Try
wbFeatureKey = Registry.CurrentUser.OpenSubKey(featuresKey,
RegistryKeyPermissionCheck.ReadWriteSubTree, RegistryRights.WriteKey)
If wbFeatureKey Is Nothing Then
wbFeatureKey = Registry.CurrentUser.CreateSubKey(featuresKey, True)
End If
wbFeatureKey.SetValue(executableName, 11001, RegistryValueKind.DWord)
wbAccelKey = Registry.CurrentUser.OpenSubKey(hardwareAccelKey,
RegistryKeyPermissionCheck.ReadWriteSubTree, RegistryRights.WriteKey)
If wbAccelKey Is Nothing Then
wbAccelKey = Registry.CurrentUser.CreateSubKey(hardwareAccelKey, True)
End If
wbAccelKey.SetValue(executableName, 1, RegistryValueKind.DWord)
Finally
wbFeatureKey?.Dispose()
wbAccelKey?.Dispose()
End Try
End Sub
Public Shared Sub DeactivateWBAdvancedFeatures(executableName As String)
Using wbFeatureKey = Registry.CurrentUser.OpenSubKey(
featuresKey, RegistryKeyPermissionCheck.ReadWriteSubTree, RegistryRights.WriteKey)
wbFeatureKey.DeleteValue(executableName, False)
End Using
Using wbAccelKey = Registry.CurrentUser.OpenSubKey(
hardwareAccelKey, RegistryKeyPermissionCheck.ReadWriteSubTree, RegistryRights.WriteKey)
wbAccelKey.DeleteValue(executableName, False)
End Using
End Sub
End Class
推荐阅读
- f# - 尝试在函数之间传递多个值时出现 F# 问题
- crystal-reports - 在 Crystal Reports 上打印多列
- excel - Power Query、M 语言、使用特殊条件连续计数项目
- javascript - 在 JavaScript 中使用 AlpineJS $persist 而不是 HTML
- python - 如何在 Python 中比较两个 CSV 文件?
- android - 在 exoplayer 中手动控制视频质量
- facebook - Facebook 赞助商消息出价金额货币转换
- sql - QuestDB 列定义检查约束,NOT NULL 和 CHECK(condition/s)
- javascript - 如何防止重新渲染没有道具的子组件?
- angular - 将纯文本传递给 Angular 组件,在 typescript 级别访问 ng-content 值