首页 > 解决方案 > 使用 HtmlAgilityPack、嵌套列表和 Linq

问题描述

List<List<string>> table = playerDoc.DocumentNode
    .SelectSingleNode($"//*[@id='lg_team_user_leagues-{leagueId}']/div[4]/table/tbody")
    .Descendants("tr")
    .Skip(1)
    .Select(tr => tr.Elements("td").Select(td => td.InnerText.Trim()).ToList())
    .ToList();

我有这个代码块,它从网站上的表格中收集所有正确的信息。我的问题是数据如下所示:

数据

例如,我试图弄清楚如何在数据中搜索 2 个匹配的字符串,S16并且Pre能够设置一个名为CareerProperties的类(如果需要,我可以发布一类道具)。我尝试了LINQ语句的不同变体并使用foreach循环,但要么抛出异常,要么得到表中的所有内容。

foreach我正在尝试简化我的代码,因为使用with检索数据大约需要 3-4 秒xpaths,当我测试该LINQ语句时,它返回为 Elapsed: 00:00:00.0068306。

任何帮助将不胜感激,因为我仍在学习C#等等。如果我需要发布示例网页或代码的任何其他部分,我会这样做。谢谢你。

编辑:

foreach (var careerStats in findCareerNode)
{
    if (careerStats
        .SelectSingleNode($"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[1]").InnerText.Trim() != seasonId)
    {
        index++;
        continue;
    }
    else if (careerStats
       .SelectSingleNode(
           $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[2]")
       .InnerText.Trim() != "Reg")
    {
        index++;
        continue;
    }
    var type = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[2]")
        .InnerText;
    var record = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[3]")
        .InnerText;
    var amr = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[4]")
        .InnerText ?? "0.0";
    var goals = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[5]")
        .InnerText;
    var assists = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[6]")
        .InnerText;
    var sot = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[7]")
        .InnerText;
    var shots = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[8]")
        .InnerText;
    var passC = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[9]")
        .InnerText;
    var passA = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[10]")
        .InnerText;
    var keypass = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[11]")
        .InnerText;
    var interceptions = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[12]")
        .InnerText;
    var tac = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[13]")
        .InnerText;
    var tacA = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[14]")
        .InnerText;
    var blk = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[15]")
        .InnerText;
    var rc = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[16]")
        .InnerText;
    var yc = careerStats
        .SelectSingleNode(
            $"//*[@id='lg_team_user_leagues-{leagueId}']/div[{div}]/table/tbody/tr[{index}]/td[17]")
        .InnerText;
    ...
}

标签: c#linqhtml-agility-pack

解决方案


要过滤职业统计表的数据,您可以使用 LINQ 方法Where。然后过滤后的数据可用于CareerProperties使用 LINQ 方法创建对象列表Select

以下是我们如何获得选定seasonId和的职业统计数据Reg

// Now the return type is a List of CareerProperties.
List<CareerProperties> table = playerDoc.DocumentNode
    .SelectSingleNode($"//*[@id='lg_team_user_leagues-{leagueId}']/div[4]/table/tbody")
    .Descendants("tr")
    .Skip(1)
    // Up to here is your code. Here you select all rows from the table.
    // Each row is presented as List<string>.
    .Select(tr => tr.Elements("td").Select(td => td.InnerText.Trim()).ToList())
    // Here we filter table rows by "seasonId" and "Reg".
    .Where(tr => tr[0] == seasonId && tr[1] == "Reg")
    // Here we create objects CareerProperties from filtered rows.
    .Select(tr => new CareerProperties
        {
            Type = tr[2],
            Record = tr[3],
            Amr = tr[4],
            Goals = tr[5]
            Assists = tr[6],
            // Fill other properties.
            ...
        })
    .ToList();

推荐阅读