首页 > 解决方案 > Encode only accented characters in an HTML string

问题描述

I have the following function that accepts an HTML string, for example "<p>áêö</p>":

public string EncodeString(string input)
{
    // ...
    return System.Net.WebUtility.HtmlEncode(input);
}

I'd like to modify that function to output the same string, but with the accented characters as HTML entities. Using System.Net.WebUtility.HtmlEncode() encodes the entire string, including the HTML tags. I'd like to preserve the HTML tags if possible, since the string is parsed and rendered elsewhere in the application. Is this something that is better solved with a regex?

标签: c#.netencodinghtml-entities

解决方案


您可以使用AngleSharp之类的库来替换 html 元素的内容:

public static async Task<string> EncodeString(string input)
{
    var context = BrowsingContext.New(Configuration.Default);
    var document = await context.OpenAsync(req => req.Content(input));
    var pElement = document.QuerySelector("p");
    pElement.TextContent = System.Net.WebUtility.HtmlEncode(pElement.TextContent);
    return pItem.ToHtml();
}

在此处查看实际操作:.NET Fiddle


对于嵌套元素的更一般情况,以下是修改后的代码:

public static async Task<string> EncodeString(string input)
{
    var context = BrowsingContext.New(Configuration.Default);
    var document = await context.OpenAsync(req => req.Content(input));
    return await EncodeString(document.Body.FirstChild);
}

private static async Task<string> EncodeString(INode content)
{
    foreach(var node in content.ChildNodes)
    {
        node.NodeValue = node.NodeType == NodeType.Text ?
            System.Net.WebUtility.HtmlEncode(node.NodeValue) :
            await EncodeString(node);
    }
    return content.ToHtml();
}

推荐阅读