首页 > 解决方案 > 使用 OpenXml 将 Html 文本内容转换为 Word

问题描述

我有一个富文本框,其中包含 html 格式的文本以及我们可以插入复制的图像。我尝试了 AlternativeFormatImportPart 和 AltChunk 方法。它正在生成文档,但出现以下错误。请让我知道我在这里缺少什么。

在此处输入图像描述 在此处输入图像描述

  
MemoryStream ms;// = new MemoryStream(new UTF8Encoding(true).GetPreamble().Concat(Encoding.UTF8.GetBytes(h)).ToArray());
                ms = new MemoryStream(HtmlToWord(fileContent));
                //MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes(h));
                // Create alternative format import part.
                AlternativeFormatImportPart chunk =
                   mainDocPart.AddAlternativeFormatImportPart(
                      "application/xhtml+xml", altChunkId);
                chunk.FeedData(ms);
                AltChunk altChunk = new AltChunk();
                altChunk.Id = altChunkId;

public static byte[] HtmlToWord(String html)
        {
            const string filename = "test.docx";
            if (File.Exists(filename)) File.Delete(filename);
            var doc = new Document();

            using (MemoryStream generatedDocument = new MemoryStream())
            {
                using (WordprocessingDocument package = WordprocessingDocument.Create(
                generatedDocument, WordprocessingDocumentType.Document))
                {
                    MainDocumentPart mainPart = package.MainDocumentPart;

                    if (mainPart == null)
                    {
                        mainPart = package.AddMainDocumentPart();
                        new Document(new Body()).Save(mainPart);
                    }


                    HtmlConverter converter = new HtmlConverter(mainPart);
                    converter.ExcludeLinkAnchor = true;
                    converter.RefreshStyles();
                    converter.ImageProcessing = ImageProcessing.AutomaticDownload;
                    //converter.BaseImageUrl = new Uri(domainNameURL + "Images/");
                    converter.ConsiderDivAsParagraph = false;

                    Body body = mainPart.Document.Body;
                    var paragraphs = converter.Parse(html);
                    for (int i = 0; i < paragraphs.Count; i++)
                    {
                        body.Append(paragraphs[i]);
                    }
                    mainPart.Document.Save();
                }
                return generatedDocument.ToArray();
            }
        }

标签: c#ms-wordopenxml-sdk

解决方案


带有 MemoryStream 的 AlternativeFormatImportPart 中存在一些问题,文档格式不正确。因此采用了另一种方法,使用 HtmlToWord 方法将 html 内容保存为 word 并使用 FileStream 读取文件内容并提供 AlternativeFormatImportPart。

string docFileName;
HtmlToWord(fileContent, out docFileName);
FileStream fileStream = File.Open(docFileName, FileMode.Open);                
// Create alternative format import part.
AlternativeFormatImportPart chunk =mainDocPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML, altChunkId);
chunk.FeedData(fileStream);
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;

推荐阅读