首页 > 解决方案 > 如何使用 iText 和 XMLWorker 在 HTML 到 pdf 转换期间呈现特殊字符?

问题描述

嗨,我正在使用 iText 和 XMLWorker 进行 HTML 到 pdf 的转换(Java),如下所示

    public void convertHtmlToPdf(StringBuilder content, String path) throws Exception {
    String methodName = "convertHtmlToPdf";

    try {

          XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
            fontProvider.register("C:/Users/Aaryan/Downloads/arial.ttf");

        final OutputStream file = new FileOutputStream(new File(path));
        final Document document = new Document();
        final PdfWriter writer = PdfWriter.getInstance(document, file);
        document.open();


        final TagProcessorFactory tagProcessorFactory = Tags.getHtmlTagProcessorFactory();
        tagProcessorFactory.removeProcessor(HTML.Tag.IMG);
        tagProcessorFactory.addProcessor(new ImageTagProcessor(), HTML.Tag.IMG);

        final CssFilesImpl cssFiles = new CssFilesImpl();
        cssFiles.add(XMLWorkerHelper.getInstance().getDefaultCSS());
        final StyleAttrCSSResolver cssResolver = new StyleAttrCSSResolver(cssFiles);
        final HtmlPipelineContext hpc = new HtmlPipelineContext(new CssAppliersImpl(fontProvider));
        hpc.setAcceptUnknown(true).autoBookmark(true).setTagFactory(tagProcessorFactory);
        final HtmlPipeline htmlPipeline = new HtmlPipeline(hpc, new PdfWriterPipeline(document, writer));
        final Pipeline<?> pipeline = new CssResolverPipeline(cssResolver, htmlPipeline);
        final XMLWorker worker = new XMLWorker(pipeline, true);
        final Charset charset = Charset.forName("UTF-8");
        final XMLParser xmlParser = new XMLParser(true, worker, charset);

        InputStream is2 = new ByteArrayInputStream(content.toString().getBytes());

        xmlParser.parse(is2, charset);

        is2.close();
        document.close();
        file.close();

    } catch (Exception ex) {
        System.out.println("Exception in Class::" + className + "::Method::" + methodName + "::" + ex.getMessage());
        ex.printStackTrace();

        throw new Exception(ex);
    }
}

PDFGeneration 工作正常。为 pdfConversion 解析的 HTML 内容具有作为适当实体的特殊字符,如下所示

   StringBuilder content = new StringBuilder();
   content.append("<html><body style=\"font-size:12.0pt; font-family:Arial\">
    <p>Testes &rarr; &rarr; Vasa efferentia &rarr; Kidney &rarr; Seminal Vescile</p></body></html>");

生成的 pdf 显示“?” 而是使用特殊字符(箭头符号)。“睾丸??输卵管?肾脏?精囊”。我哪里错了。请指导我。

标签: javaitexthtml-to-pdf

解决方案


该解决方案几乎与代码/类/对象无关......

您需要使用与您请求的输出字符集匹配的内容来设置 CSS“字体系列”

例如,如果您在 'p' html 标记中有特殊字符,那么您可以使用所需的字体系列设置以下样式:

<HEAD>
<style>
p {
  font-family: -good-font-family-
}
</style>
</HEAD>

推荐阅读