首页 > 解决方案 > 使用 apache POI 将 doc 转换为 html 剪切最终的 html 文件

问题描述

我使用 apache POI 4.0.0 将 .doc 转换为 .html。

    private static String ProcessingDoc(File doc, String imagedir) throws IOException, ParserConfigurationException, TransformerConfigurationException, TransformerFactoryConfigurationError {
    FileInputStream in = new FileInputStream(doc);
    HWPFDocument doc_file = new HWPFDocument(in);

    Document html_file = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();

    WordToHtmlConverter converter = new WordToHtmlConverter(html_file);

    converter.setPicturesManager(new PicturesManager() {

        @Override
        public String savePicture(byte[] content, PictureType pictureType, String suggestedName, float widthInches,
                float heightInches) {
            File imgFile = new File(getParentDirectory(doc));
            if(!imgFile.exists()){
                imgFile.mkdirs();
            }
            try {
                FileOutputStream out = new FileOutputStream(imagedir+"/" + suggestedName);
                out.write(content);
                out.close();
            } catch (Exception e) {
                e.printStackTrace();
            }

            return suggestedName;
        }
    });

    converter.processDocument(doc_file);
    StringWriter stringWriter = new StringWriter();
    Transformer transformer;
    transformer = TransformerFactory.newInstance().newTransformer();
    transformer.setOutputProperty( OutputKeys.INDENT, "yes" );
    transformer.setOutputProperty( OutputKeys.ENCODING, "utf-8" );
    transformer.setOutputProperty( OutputKeys.METHOD, "html" );
    try {
        transformer.transform(
                new DOMSource( converter.getDocument() ),
                new StreamResult( stringWriter ) );
    } catch (TransformerException e) {
        e.printStackTrace();
    }
    return stringWriter.toString();
}

}

但是 POI 创建了一些不完整的 html 文件,在文件的不同位置被剪切。它看起来像:

<some text of html document>
                <tr class="r1">
                    <td class="td49">
                        <p class="p17"></p>
                    </td><td class="td50">
                        <p class="p17"></p>
                    </td><td class="td51">

它的 html 文件的结尾。转换过程中没有错误。

为什么我没有错误,也没有完整的文件?

感谢您的回答!

标签: javaapache-poi

解决方案


推荐阅读