c# - 有没有办法格式化嵌套的 html 列表到一个可读的字符串
问题描述
嗨,我有一个字符串,它是一个 html 列表:
<ol>
<li>afssafsafsafsafasf</li>
<li>safsafsafsafsafasfasf</li>
<li>safsafasfsafasfasfasf</li>
<li>+95+5454</li>
<li>sgsddsgd;l'm;l;mlm;lmml;l</li>
</ol>
<ol>
<li>544564654664654</li>
<ol>
<li>546464646464</li>
</ol>
</ol>
我想将此字符串转换为如下所示的字符串:
- afssafsafsafsafsf
- safsafsafsafsafsfasf
- safsafasfsafsfasfasf
- +95+5454
- sgsddsgd;l'm;l;mlm;lmml;l
- 544564654664654
- 546464646464
有没有办法做到这一点
到目前为止,我用我的代码实现的是形成一个看起来像这样的字符串:
1.afssafsafsafsafsaf
2.safsafsafsafsafsfasf
3.safsafasfsafsfasfasf
4.+95+5454
5.sgsddsgd;l'm;l;mlm;lmml;l
6.544564654664654
7.546464646464
但是,正如您所看到的,有序列表只是被忽略了,我将通过列表项...
这是代码:
protected void exportBtn_Click(object sender, EventArgs e)
{
string src =s; // the source of the string
src = ConvertToPlainText(src);
var jaja = ReplaceWithIncrementingNumber(src, "\r\n*", "*");
}
public static string ConvertToPlainText(string html)
{
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
StringWriter sw = new StringWriter();
ConvertTo(doc.DocumentNode, sw);
sw.Flush();
return sw.ToString();
}
/// <summary>
/// Count the words.
/// The content has to be converted to plain text before (using ConvertToPlainText).
/// </summary>
/// <param name="plainText">The plain text.</param>
/// <returns></returns>
public static int CountWords(string plainText)
{
return !String.IsNullOrEmpty(plainText) ? plainText.Split(' ', '\n').Length : 0;
}
public static string Cut(string text, int length)
{
if (!String.IsNullOrEmpty(text) && text.Length > length)
{
text = text.Substring(0, length - 4) + " ...";
}
return text;
}
private static void ConvertContentTo(HtmlNode node, TextWriter outText)
{
foreach (HtmlNode subnode in node.ChildNodes)
{
ConvertTo(subnode, outText);
}
}
static string ReplaceWithIncrementingNumber(string input, string find, string partToReplace)
{
if (input == null || find == null ||
partToReplace == null || !find.Contains(partToReplace))
{
return input;
}
// Get the index of the first occurrence of our 'find' string
var index = input.IndexOf(find);
// Track the number of occurrences we've found, to use as a replacement string
var counter = 1;
while (index > -1)
{
// Get the leading string up to '*', add the counter, then add the trailing string
input = input.Substring(0, index) +
find.Replace(partToReplace, $"{counter++}.") +
input.Substring(index + find.Length);
// Find the next occurrence of our 'find' string
index = input.IndexOf(find, index + find.Length);
}
return input;
}
private static void ConvertTo(HtmlNode node, TextWriter outText)
{
string html;
switch (node.NodeType)
{
case HtmlNodeType.Comment:
// don't output comments
break;
case HtmlNodeType.Document:
ConvertContentTo(node, outText);
break;
case HtmlNodeType.Text:
// script and style must not be output
string parentName = node.ParentNode.Name;
if ((parentName == "script") || (parentName == "style"))
break;
// get text
html = ((HtmlTextNode)node).Text;
// is it in fact a special closing node output as text?
if (HtmlNode.IsOverlappedClosingElement(html))
break;
// check the text is meaningful and not a bunch of whitespaces
if (html.Trim().Length > 0)
{
outText.Write(HtmlEntity.DeEntitize(html));
}
break;
case HtmlNodeType.Element:
switch (node.Name)
{
case "li":
{
outText.Write("\r\n*");
// outText.Write("\r\n");
break;
}
case "p":
// treat paragraphs as crlf
outText.Write("\r\n");
break;
case "br":
outText.Write("\r\n");
break;
}
if (node.HasChildNodes)
{
ConvertContentTo(node, outText);
// outText.Write("\r\n");
}
break;
}
}
解决方案
推荐阅读
- redux - 在减速器中编写嵌套的 Switch Case 是一种好习惯吗
- php - PHP - Google API / Gmail API - 如何设置quotaUser?
- parse-platform - 我正在尝试使用云代码在 heroku 上设置推送通知?
- python - ValueError: int() 以 10 为底的无效文字:b'1\n5\n'
- jekyll - 如何为我的 jekyll 网站压缩照片?
- python - 如何使用python从后台函数返回确认到pubsub
- r - R Dataframe新列与其他两列的滚动相关
- python - 如何使用 Requests 的 GET 方法查找未完成的 url?
- python - 检查并运行正确的 TensorFlow 版本(v2.0)
- java - Android - 如何在 Firestore 中显示创建的用户