首页 > 解决方案 > C# 从 XslCompiledTransform.Transform 的 XML 中删除不支持 UTF-8 的值

问题描述

每次我想运行 XslCompiledTransform.Transform 时,都会由于无效字符而出现异常。这样的字符之一是例如“xFFFE”。

如何删除 C# 中的所有无效字符?

XmlConvert.IsXmlChar 不起作用,因为在这里我检查每个单个字符和“xFFFE”,因为单个字符不是无效字符。

我总是在 XslCompiledTransform.Transfor 中遇到异常,但前提是 XML 文档中有“xFFFE”。

这是代码:

string document = "<?xml version=\"1.0\" encoding=\"utf-8\"?><FirstTag><Second><Third>;&#xFFFE;</Third></Second></FirstTag>";

public static string Clean(string document)
{
    XmlWriterSettings writerSettings = new XmlWriterSettings();

    XsltArgumentList argsList;
    document = RemoveXmlNotSupportedSigns(document);

    string result = "<?xml version=\"1.0\" encoding=\"utf-8\"?>";
    try
    {
        using (StringReader sr = new StringReader(document))
        {
            using (StringWriter sw = new StringWriter())
            {
                using (XmlReader xmlR = XmlReader.Create(sr))
                {
                    using (XmlWriter xmlW = XmlWriter.Create(sw, writerSettings))
                    {
                        Uri uri = new Uri(string.Format(CultureInfo.InvariantCulture, "{0}clean.xsl", Uri), UriKind.Relative);
                        argsList = new XsltArgumentList();

                        using (Stream xslSheet = Application.GetResourceStream(uri).Stream)
                        {
                            //Init resolver with the url of the recource path without filename
                            ResourceResolver resolver = new ResourceResolver(Uri);

                            using (XmlReader xmlReader = XmlReader.Create(xslSheet))
                            {
                                XsltSettings settings = new XsltSettings();
                                settings.EnableDocumentFunction = true;
                                // Transform
                                XslCompiledTransform.Load(xmlReader, settings, resolver);

                                XslCompiledTransform.Transform(xmlR, argsList, xmlW, resolver);
                            }
                        }
                    }
                }

                result = result + sw.ToString();
            }
        }
        return result;
    }
    catch (Exception Ex)
    {
        return result;
    }

}

标签: c#xmlxslt

解决方案


如果您查看https://www.w3.org/TR/xml/#charsets,您会发现允许的字符范围[#xE000-#xFFFD]显然不包括#xFFFE. 所以这个字符不是格式良好的 XML 1.0 文档的一部分,在您的代码示例中,它不是 XslCompiledTransform 或 XSLT 拒绝它,它只是底层解析器 XmlReader。

如果您想使用 XmlReader 处理此类格式错误的输入,您可以使用XmlReaderSettingswithCheckCharacters = false并消除此类字符,我认为,通过使用 eg 检查每个字符XmlConvert.IsXmlChar

借助 MvpXml 库 ( https://github.com/keimpema/Mvp.Xml.NetStandard ) 中的 XmlWrappingReader,您可以实现过滤 XmlReader:

public class MyWrappingReader : XmlWrappingReader
{
    public MyWrappingReader(XmlReader baseReader) : base(baseReader) { }

    public override string Value => base.NodeType == XmlNodeType.Text || base.NodeType == XmlNodeType.CDATA || base.NodeType == XmlNodeType.Attribute ? CleanString(base.Value) : base.Value;

    public override string ReadString()
    {
        if (base.NodeType == XmlNodeType.Text || base.NodeType == XmlNodeType.CDATA || base.NodeType == XmlNodeType.Attribute)
        {
            return CleanString(base.ReadString());
        }
        else
        {
            return base.ReadString();
        }
    }

    public override string GetAttribute(int i)
    {
        return CleanString(base.GetAttribute(i));
    }

    public override string GetAttribute(string localName, string namespaceUri)
    {
        return CleanString(base.GetAttribute(localName, namespaceUri));
    }

    public override string GetAttribute(string name)
    {
        return CleanString(base.GetAttribute(name));
    }

    private string CleanString(string input)
    {
        return string.Join("", input.ToCharArray().Where(c => XmlConvert.IsXmlChar(c)));
    }
}

然后使用该阅读器过滤您的输入,XslCompiledTransform 应该在清理后的 XML 上工作,例如以下运行良好:

       string document = "<?xml version=\"1.0\" encoding=\"utf-8\"?><FirstTag><Second att1='value&#xFFFE;'><Third>a&#xFFFE;</Third></Second></FirstTag>";

        string xsltIndentity = @"<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0'><xsl:template match='@* | node()'><xsl:copy><xsl:apply-templates select='@* | node()'/></xsl:copy></xsl:template></xsl:stylesheet>";

        using (StringReader sr = new StringReader(document))
        {
            using (XmlReader xr = new MyWrappingReader(XmlReader.Create(sr, new XmlReaderSettings() { CheckCharacters = false })))
            {
                using (StringReader xsltSrReader = new StringReader(xsltIndentity))
                {
                    using (XmlReader xsltReader = XmlReader.Create(xsltSrReader))
                    {
                        XslCompiledTransform processor = new XslCompiledTransform();
                        processor.Load(xsltReader);
                        processor.Transform(xr, null, Console.Out);
                        Console.WriteLine();
                    }
                }
            }
        }

推荐阅读