首页 > 解决方案 > VB HtmlAgilityPack 加入字符串节点,忽略节点未找到

问题描述

我只需要以下转换:

("c1", "c2", "c3", "c665", "c666") --> "c1, c2, c3, c665, c666">

("c1", c2, "c3", 未找到节点 c665, 未找到节点 c666) --> "c1, c2, c3"

如果找不到节点,则跳过 c665 和 c666

如果出现的搜索结果只有来自网络的 5 项,则忽略下一项

Imports System
Imports HtmlAgilityPack
Imports System.Linq
                
Public Module Module1
    Public Sub Main()
        Dim Web As New HtmlWeb()
        Dim doc2 As New HtmlDocument()
        doc2 = Web.Load("https://download.cnet.com/s/pdf/")
        Try
            '' Categories (Single Node) 
            Dim Categories1 As HtmlNode = doc2.DocumentNode.SelectSingleNode("//div[@id='search-results']/a[1]/div/h4/div[1]")
            Dim c1 = Categories1.InnerText
            Dim Categories2 As HtmlNode = doc2.DocumentNode.SelectSingleNode("//div[@id='search-results']/a[2]/div/h4/div[1]")
            Dim c2 = Categories2.InnerText
            Dim Categories3 As HtmlNode = doc2.DocumentNode.SelectSingleNode("//div[@id='search-results']/a[3]/div/h4/div[1]")
            Dim c3 = Categories3.InnerText
            Dim Categories665 As HtmlNode = doc2.DocumentNode.SelectSingleNode("//div[@id='search-results']/a[665]/div/h4/div[1]")
            Dim c665 = Categories665.InnerText
            Dim Categories666 As HtmlNode = doc2.DocumentNode.SelectSingleNode("//div[@id='search-results']/a[666]/div/h4/div[1]")
            Dim c666 = Categories666.InnerText
            Dim array() As String = {c1, c2, c3, c665, c666}
            Dim Full As String = String.Join(", ", array.Where(Function(x) Not String.IsNullOrWhiteSpace(x)))
                If (Categories1.InnerText.Contains("")) Then
                    Console.WriteLine (Full)
                End if
        Catch
        End Try
    End Sub
End Module

https://dotnetfiddle.net/XJKRiw

标签: vb.nethtml-agility-pack

解决方案


如果未找到节点,SelectSingleNode将返回。Nothing您可以检查它而不对其进行任何进一步处理。

我注意到除了一个小字符串之外,您重复了相同的代码 - 这通常是将代码放入循环的良好候选者。

我猜您使用的是旧版本的 Visual Studio,这就是?您无法使用操作员的原因,所以我将展示一个没有更新功能的示例:

Sub Main()

    Dim Web As New HtmlWeb()
    Dim doc2 As New HtmlDocument()
    doc2 = Web.Load("https://download.cnet.com/s/pdf/")

    Dim indices = {1, 2, 3, 665, 666}
    Dim results As New List(Of String)

    For Each ix In indices
        Dim xpathQuery = String.Format("//div[@id='search-results']/a[{0}]/div/h4/div[1]", ix)
        Dim c = doc2.DocumentNode.SelectSingleNode(xpathQuery)

        If c IsNot Nothing Then
            results.Add(c.InnerText)
        End If

    Next

    If results.Count > 0 Then
        Console.WriteLine(String.Join(", ", results))
    Else
        Console.WriteLine("Nothing found.")
    End If

    Console.ReadLine()

End Sub

输出:

PDF 下载、PDF 重定向、免费 PDF 到 Word


推荐阅读