首页 > 解决方案 > 使用 Docx4j 从 XML 转换为 Docx

问题描述

我正在使用 Java Docx4j 库将 .docx 文件转换为其 .xml 表示形式,将 XML 存储在数据库中,然后将 XML 转换回 .docx 文件。

到目前为止,我可以成功地将 .docx 文件转换为 XML 并将其存储在数据库中。但是,我在将该 XML 转换回 .docx 表单时遇到问题。无论如何,我都没有编辑 XML。如果我在 Word 中打开 XML 文件,它显示正常。

String inputFilePath = args[0];
WordprocessingMLPackage wmlPackage = Docx4J.load(new File(inputFilePath));

ByteArrayOutputStream baos = new ByteArrayOutputStream();
Docx4J.save(wmlPackage, baos, Docx4J.FLAG_SAVE_FLAT_XML);

DatabaseController databaseController = new DatabaseController();
databaseController.commitXMLToDatabase(baos, "file-sample_1MB"); // Add the XML and filename to DB

String xml = databaseController.retrieveDocument("file-sample_1MB");

// Issue with the code below:
WordprocessingMLPackage testPkg = WordprocessingMLPackage.createPackage();
testPkg.getMainDocumentPart().unmarshal(new ByteArrayInputStream(xml.getBytes(StandardCharsets.UTF_8)));
testPkg.save(new File("src/main/resources/test1.docx")); 

我收到以下错误(我已删除列出的一些方案 URL)

Exception in thread "main" javax.xml.bind.JAXBException
 - with linked exception:
[javax.xml.bind.UnmarshalException
 - with linked exception:
[com.sun.istack.SAXParseException2; lineNumber: 1; columnNumber: 133; unexpected element (uri:"http://schemas.microsoft.com/office/2006/xmlPackage", local:"package"). Expected elements are <{urn:schemas-microsoft-com:office:excel}ClientData>,<{http://schemas.openxmlformats.org/drawingml/2006/spreadsheetDrawing}wsDr>,<{}xml>,<{http://opendope.org/xpaths}xpath>,<{http://opendope.org/conditions}xpathref>,<{http://opendope.org/xpaths}xpaths>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}yearLong>,<{http://schemas.openxmlformats.org/wordprocessingml/2006/main}yearShort>]]
    at org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware.unmarshal(JaxbXmlPartXPathAware.java:586)
    at org.docx4j.openpackaging.parts.JaxbXmlPartXPathAware.unmarshal(JaxbXmlPartXPathAware.java:346)
    at DocxToXML.main(DocxToXML.java:37)
Caused by: javax.xml.bind.UnmarshalException

任何帮助将不胜感激。如果有任何帮助,我可以发布 .docx 和 .xml 文件。

标签: javaxmldocx4j

解决方案


现在修好了。我现在使用的代码如下:

// retrieveDocument() gets the data from DB Blob as a byte[] Array 
// and returns an InputStream
InputStream xml = databaseController.retrieveDocument("Test1"); 
WordprocessingMLPackage pkg = Docx4J.load(xml);
Docx4J.save(pkg, new File("src/main/resources/output/test1.docx"));

推荐阅读