首页 > 解决方案 > 只获得第一

  • 问题描述

    这是问题所在。我有一个网站和几个子页面

    子页面: DAMSKIE, MĘSKIE, DZIECIĘCE, SPORT, AKCESORIA, PREMIUM, TOREBKI, WYPRZEDAŻ,

    在每一个上都很少有分类元素,如“Półbuty”、“Klapki”等。

    我可以获得子页面,但我无法获得分类元素列表(Półbuty、Klapki 等)。如果列表看起来像:“Półbuty”、“Klapki”、“Obcasy”,我的代码只会得到“Półbuty”,但他没有得到“Klapki”或“Obcasy”。

    [子页面的图像+我试图获取的元素列表][1]

    using HtmlAgilityPack;
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Net.Http;
    using System.Text;
    using System.Threading.Tasks;
    
    namespace Crawler_Shoes
    {
        public class Crawl
        {
            private static string navBar = "megamenu__item";
            private const string shoesTypes = "sidebar-section__wrapper sidebar-section__wrapper--categories";
            private static string mainSite = "https://www.eobuwie.com.pl/";
            public static List<string> categoriesNames = new List<string>();
            public static List<string> linksNames = new List<string>();
            public static List<string> categoriesOfCategoriesNames = new List<string>();
            private readonly List<Shoes> shoes = new List<Shoes>();
    
            public static async Task<IEnumerable<HtmlNode>> HttpClient(string site, string descendant, string equals)
            {
                var httpClient = new HttpClient();
                var html = await httpClient.GetStringAsync(site);
                var htmlDocument = new HtmlDocument();
                htmlDocument.LoadHtml(html);
                return htmlDocument.DocumentNode.Descendants(descendant)
                    .Where(node => node.GetAttributeValue("class", "").Equals(equals)).ToList();
            }
            public static async Task GetCategories()
            {
                var menu = await HttpClient(mainSite, "li", navBar);                      
                foreach (var nav in menu)
                {
                    //links.Add(nav.Descendants("a").FirstOrDefault().ChildAttributes("href").FirstOrDefault().Value);
                    categoriesNames.Add(nav.Descendants("a").FirstOrDefault().InnerText); //gets names of categories
                    linksNames.Add(nav.Descendants("a").FirstOrDefault().ChildAttributes("href").FirstOrDefault().Value); //gets links for categories
                    if(categoriesNames.Last() == "\n\t\t\tWyprzedaż\t\t")
                    {
                        categoriesNames.Remove(categoriesNames.Last());
                        linksNames.Remove(categoriesNames.Last());
                    }
                }
                Crawl.GetCategoriesofCategories();
            }
            public static async Task GetCategoriesofCategories()
            {
                    for (var i = 0; i <= categoriesNames.Count-1; i++)
                    {
                        var categories = await HttpClient(linksNames.ElementAt(i), "ul", shoesTypes);
                        categoriesOfCategoriesNames.Add(categoriesNames.ElementAt(i));
                        foreach(var li in categories)
                        {
                            categoriesOfCategoriesNames.Add(li.Descendants("a").FirstOrDefault().ChildAttributes("href").FirstOrDefault().Value);
                        }
                    }
    
            }
        }
    }
    

    有问题的部分:

        public static async Task GetCategoriesofCategories()
                {
                        for (var i = 0; i <= categoriesNames.Count-1; i++)
                        {
                            var categories = await HttpClient(linksNames.ElementAt(i), "ul", shoes
    
    Types);
                        categoriesOfCategoriesNames.Add(categoriesNames.ElementAt(i));
                        foreach(var li in categories)
                        {
                            categoriesOfCategoriesNames.Add(li.Descendants("a").FirstOrDefault().ChildAttributes("href").FirstOrDefault().Value);
                        }
                    }
    
            }
    

    标签: c#html-agility-pack

    解决方案


    我在这方面取得了成功:

    string url = "https://www.eobuwie.com.pl/damskie.html";
    HtmlWeb web = new HtmlWeb();
    HtmlDocument doc = web.Load(url);
    var sidebar = doc.DocumentNode.SelectSingleNode("//ul[@class='sidebar-section__wrapper sidebar-section__wrapper--categories']");
    var categories = sidebar.SelectNodes("li");
    foreach (var category in categories)
    {
        var anchor = category.SelectSingleNode("a");
        string shoeCategory = anchor.InnerText.Trim();
        Console.WriteLine(shoeCategory);
    }
    

    这与您的操作方式有些不同,但我至少希望您能够从中获得一些提示并将其应用到您自己的代码中。

    如果您还需要链接,请添加以下内容:

    string shoeCategoryLink = anchor.GetAttributeValue("href", string.Empty);
    

    推荐阅读