首页 > 解决方案 > 有没有办法格式化嵌套的 html 列表

    到一个可读的字符串

问题描述

嗨,我有一个字符串,它是一个 html 列表:

<ol>
  <li>afssafsafsafsafasf</li>
  <li>safsafsafsafsafasfasf</li>
  <li>safsafasfsafasfasfasf</li>
  <li>+95+5454</li>
  <li>sgsddsgd;l'm;l;mlm;lmml;l</li>
</ol> 
<ol>
  <li>544564654664654</li> 
  <ol> 
    <li>546464646464</li>
  </ol>
</ol>

我想将此字符串转换为如下所示的字符串:

  1. afssafsafsafsafsf
  2. safsafsafsafsafsfasf
  3. safsafasfsafsfasfasf
  4. +95+5454
  5. sgsddsgd;l'm;l;mlm;lmml;l
  1. 544564654664654
    1. 546464646464

有没有办法做到这一点

到目前为止,我用我的代码实现的是形成一个看起来像这样的字符串:

1.afssafsafsafsafsaf

2.safsafsafsafsafsfasf

3.safsafasfsafsfasfasf

4.+95+5454

5.sgsddsgd;l'm;l;mlm;lmml;l

6.544564654664654

7.546464646464

但是,正如您所看到的,有序列表只是被忽略了,我将通过列表项...

这是代码:

protected void exportBtn_Click(object sender, EventArgs e)
{
string src =s; // the source of the string
src = ConvertToPlainText(src);
var jaja = ReplaceWithIncrementingNumber(src, "\r\n*", "*");
}


public static string ConvertToPlainText(string html)
        {
            HtmlDocument doc = new HtmlDocument();
            doc.LoadHtml(html);

            StringWriter sw = new StringWriter();
            ConvertTo(doc.DocumentNode, sw);
            sw.Flush();
            return sw.ToString();
        }





        /// <summary>
        /// Count the words.
        /// The content has to be converted to plain text before (using ConvertToPlainText).
        /// </summary>
        /// <param name="plainText">The plain text.</param>
        /// <returns></returns>
        public static int CountWords(string plainText)
        {
            return !String.IsNullOrEmpty(plainText) ? plainText.Split(' ', '\n').Length : 0;
        }


        public static string Cut(string text, int length)
        {
            if (!String.IsNullOrEmpty(text) && text.Length > length)
            {
                text = text.Substring(0, length - 4) + " ...";
            }
            return text;
        }


        private static void ConvertContentTo(HtmlNode node, TextWriter outText)
        {
            foreach (HtmlNode subnode in node.ChildNodes)
            {
                ConvertTo(subnode, outText);
            }
        }

        static string ReplaceWithIncrementingNumber(string input, string find, string partToReplace)
        {
            if (input == null || find == null ||
                partToReplace == null || !find.Contains(partToReplace))
            {
                return input;
            }

            // Get the index of the first occurrence of our 'find' string
            var index = input.IndexOf(find);

            // Track the number of occurrences we've found, to use as a replacement string
            var counter = 1;


                while (index > -1)
                {
                    // Get the leading string up to '*', add the counter, then add the trailing string
                    input = input.Substring(0, index) +
                            find.Replace(partToReplace, $"{counter++}.") +
                            input.Substring(index + find.Length);

                    // Find the next occurrence of our 'find' string
                    index = input.IndexOf(find, index + find.Length);
                }

                return input;

        }


        private static void ConvertTo(HtmlNode node, TextWriter outText)
        {
            string html;
            switch (node.NodeType)
            {
                case HtmlNodeType.Comment:
                    // don't output comments
                    break;

                case HtmlNodeType.Document:
                    ConvertContentTo(node, outText);
                    break;

                case HtmlNodeType.Text:
                    // script and style must not be output
                    string parentName = node.ParentNode.Name;
                    if ((parentName == "script") || (parentName == "style"))
                        break;

                    // get text
                    html = ((HtmlTextNode)node).Text;

                    // is it in fact a special closing node output as text?
                    if (HtmlNode.IsOverlappedClosingElement(html))
                        break;

                    // check the text is meaningful and not a bunch of whitespaces
                    if (html.Trim().Length > 0)
                    {
                        outText.Write(HtmlEntity.DeEntitize(html));
                    }
                    break;

                case HtmlNodeType.Element:
                    switch (node.Name)
                    {

                        case "li":
                            {
                                outText.Write("\r\n*");
                                //  outText.Write("\r\n");
                                break;
                            }



                        case "p":
                            // treat paragraphs as crlf
                            outText.Write("\r\n");
                            break;
                        case "br":
                            outText.Write("\r\n");
                            break;
                    }

                    if (node.HasChildNodes)
                    {

                                    ConvertContentTo(node, outText);
                                    //  outText.Write("\r\n");


                    }
                    break;
            }
        }

标签: c#asp.netlist

解决方案


推荐阅读