首页 > 解决方案 > Docx4j 导出到 PDF/A-1b - base14 字体嵌入

问题描述

我需要使用 Apache FOP 后端在 Ubuntu 服务器上将 docx 文档导出为 PDF/A-1b。

该文件没有什么花哨的,它使用基本的 windows 字体 Calibri、Courier New、Times New Roman、Symbol、Wingdings。PDF/A-1b 配置文件需要嵌入所有字体,包括标准 base-14 字体,所以我从 /usr/share/fonts/type1/urw-base35 提取了 Ubuntu Type1 字体,我有 14 个 .pfb 和 14 个.afm 文件在/home/luca/Desktop/ubuntufonts/路径中。

我认为我正确设置了所有内容,但启用 A-1b 配置文件会导致以下异常:

Caused by: java.io.FileNotFoundException: Neither an AFM nor a PFM file was found for NimbusRoman-BoldItalic.pfb
    at org.apache.fop.fonts.type1.Type1FontLoader.read(Type1FontLoader.java:147)
    at org.apache.fop.fonts.FontLoader.getFont(FontLoader.java:126)
    at org.apache.fop.fonts.FontLoader.loadFont(FontLoader.java:110)
    at org.apache.fop.fonts.LazyFont.load(LazyFont.java:119)
...
 Caused by: java.lang.RuntimeException: Failed to read font file NimbusRoman-BoldItalic.pfb
    at org.apache.fop.fonts.LazyFont.load(LazyFont.java:132)
    at org.apache.fop.fonts.LazyFont.hasChar(LazyFont.java:179)
    at org.apache.fop.fonts.Font.hasChar(Font.java:278)
    at org.apache.fop.fonts.FontSelector.selectFontForCharacter(FontSelector.java:47)
    at org.apache.fop.fonts.FontSelector.selectFontForCharacterInText(FontSelector.java:85)
    at org.apache.fop.layoutmgr.inline.TextLayoutManager.initialize(TextLayoutManager.java:162)
    at org.apache.fop.layoutmgr.AbstractLayoutManager.getChildLM(AbstractLayoutManager.java:118)

但是文件就在那里:

luca@luca-vm:~/Desktop/ubuntufonts$ ls
D050000L.afm                 NimbusRoman-Italic.afm
D050000L.pfb                 NimbusRoman-Italic.pfb
NimbusMonoPS-Bold.afm        NimbusRoman-Regular.afm
NimbusMonoPS-BoldItalic.afm  NimbusRoman-Regular.pfb
NimbusMonoPS-BoldItalic.pfb  NimbusSans-Bold.afm
NimbusMonoPS-Bold.pfb        NimbusSans-BoldItalic.afm
NimbusMonoPS-Italic.afm      NimbusSans-BoldItalic.pfb
NimbusMonoPS-Italic.pfb      NimbusSans-Bold.pfb
NimbusMonoPS-Regular.afm     NimbusSans-Italic.afm
NimbusMonoPS-Regular.pfb     NimbusSans-Italic.pfb
NimbusRoman-Bold.afm         NimbusSans-Regular.afm
NimbusRoman-BoldItalic.afm   NimbusSans-Regular.pfb
NimbusRoman-BoldItalic.pfb   StandardSymbolsPS.afm
NimbusRoman-Bold.pfb         StandardSymbolsPS.pfb

从网络搜索来看,似乎继续的方法是创建一个 fop.xml 配置文件,将字体名称映射到我提取的文件。这是我准备的文件:

<fop version="1.0">
    <font-base>/home/luca/Desktop/ubuntufonts/</font-base>
    <renderers>
        <renderer mime="application/pdf">
            <fonts>
                <font embed-url="NimbusSans-Regular.pfb" embedding-mode="full">
                    <font-triplet name="Helvetica" style="normal" weight="normal" />
                    <font-triplet name="Calibri" style="normal" weight="normal" />
                </font>
                <font embed-url="NimbusSans-Bold.pfb" embedding-mode="full">
                    <font-triplet name="Helvetica" style="normal" weight="bold" />
                    <font-triplet name="Calibri" style="normal" weight="bold" />
                </font>
                <font embed-url="NimbusSans-Italic.pfb" embedding-mode="full">
                    <font-triplet name="Helvetica" style="italic" weight="normal" />
                    <font-triplet name="Calibri" style="italic" weight="normal" />
                </font>
                <font embed-url="NimbusSans-BoldItalic.pfb" embedding-mode="full">
                    <font-triplet name="Helvetica" style="italic" weight="bold" />
                    <font-triplet name="Calibri" style="italic" weight="bold" />
                </font>

                <font embed-url="NimbusRoman-Regular.pfb" embedding-mode="full">
                    <font-triplet name="Times" style="normal" weight="normal" />
                    <font-triplet name="Times New Roman" style="normal" weight="normal" />
                </font>
                <font embed-url="NimbusRoman-Bold.pfb" embedding-mode="full">
                    <font-triplet name="Times" style="normal" weight="bold" />
                    <font-triplet name="Times New Roman" style="normal" weight="normal" />
                </font>
                <font embed-url="NimbusRoman-Italic.pfb" embedding-mode="full">
                    <font-triplet name="Times" style="italic" weight="normal" />
                    <font-triplet name="Times New Roman" style="normal" weight="normal" />
                </font>
                <font embed-url="NimbusRoman-BoldItalic.pfb" embedding-mode="full">
                    <font-triplet name="Times" style="italic" weight="bold" />
                    <font-triplet name="Times New Roman" style="normal" weight="normal" />
                </font>

                <font embed-url="NimbusMonoPS-Regular.pfb" embedding-mode="full">
                    <font-triplet name="Courier" style="normal" weight="normal" />
                    <font-triplet name="Courier New" style="normal" weight="normal" />
                </font>
                <font embed-url="NimbusMonoPS-Bold.pfb" embedding-mode="full">
                    <font-triplet name="Courier" style="normal" weight="bold" />
                    <font-triplet name="Courier New" style="normal" weight="bold" />
                </font>
                <font embed-url="NimbusMonoPS-Italic.pfb" embedding-mode="full">
                    <font-triplet name="Courier" style="italic" weight="normal" />
                    <font-triplet name="Courier New" style="italic" weight="normal" />
                </font>
                <font embed-url="NimbusMonoPS-BoldItalic.pfb" embedding-mode="full">
                    <font-triplet name="Courier" style="italic" weight="bold" />
                    <font-triplet name="Courier New" style="italic" weight="bold" />
                </font>

                <font embed-url="StandardSymbolsPS.pfb" embedding-mode="full">
                    <font-triplet name="Symbol" style="normal" weight="normal" />
                    <font-triplet name="Symbol" style="normal" weight="bold" />
                </font>

                <font embed-url="D050000L.pfb" embedding-mode="full">
                    <font-triplet name="ZapfDingbats" style="normal" weight="normal" />
                    <font-triplet name="ZapfDingbats" style="normal" weight="bold" />
                </font>
            </fonts>
        </renderer>
    </renderers>
</fop>

这是我使用的最终转换代码:

        // Document loading (required)
        WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(classPathResource.getFile());

        // Set up font mapper (optional)
        Mapper fontMapper = new IdentityPlusMapper();
        wordMLPackage.setFontMapper(fontMapper);

        // FO exporter setup (required)
        // .. the FOSettings object
        String fopConfig = Files.readString(new ClassPathResource("fop.xml").getFile().toPath());
        FOSettings foSettings = Docx4J.createFOSettings();
        foSettings.setApacheFopConfiguration(fopConfig);
        foSettings.setOpcPackage(wordMLPackage);

        FOUserAgent foUserAgent = FORendererApacheFOP.getFOUserAgent(foSettings);
        foUserAgent.getRendererOptions().put("pdf-a-mode", "PDF/A-1b");

        // PDF/A-1a, PDF/A-2a and PDF/A-3a require accessibility to be enabled
        // see further https://stackoverflow.com/a/54587413/1031689
        foUserAgent.setAccessibility(true); // suppress "missing language information" messages from FOUserAgent .processEvent

        ByteArrayOutputStream os = new ByteArrayOutputStream();
        Docx4J.toFO(foSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL);

        // Clean up, so any ObfuscatedFontPart temp files can be deleted 
        if (wordMLPackage.getMainDocumentPart().getFontTablePart()!=null) {
            wordMLPackage.getMainDocumentPart().getFontTablePart().deleteEmbeddedFontTempFiles();
        }       
        // This would also do it, via finalize() methods
        foSettings = null;
        wordMLPackage = null;

我还尝试将文档字体直接嵌入到 Word 文档中,并在各种尝试之间删除 fop 缓存,但这些操作并没有解决问题。

关于如何解决这个问题的任何想法?

标签: pdfapache-fopdocx4j

解决方案


折腾了两天多,终于找到了。由于某种原因,元素必须以方案为前缀:

<font-base>file:/home/luca/Desktop/ubuntufonts/</font-base>

现在,我还想为未来沮丧的读者指出,实际上没有理由使用 Type1 字体来映射 Base14 字体,所以请帮自己一个忙并使用 OTF 字体来映射它们(在我的 Ubuntu vm 上,它们位于/usr/share/fonts/opentype/urw-base35),因此不需要额外的 AFM/PFM 文件查找。

这是我最终的 xml 配置文件:

<fop version="1.0">
    <font-base>file:/home/luca/Desktop/ubuntuttf/</font-base>
    <use-cache>false</use-cache>
    <strict-configuration>true</strict-configuration>
    <renderers>
        <renderer mime="application/pdf">
            <fonts>
                <font embed-url="NimbusSans-Regular.otf">
                    <font-triplet name="Helvetica" style="normal" weight="normal" />
                    <font-triplet name="Calibri" style="normal" weight="normal" />
                    <font-triplet name="sans-serif" style="normal" weight="normal"/>
                    <font-triplet name="SansSerif" style="normal" weight="normal"/>
                </font>
                <font embed-url="NimbusSans-Bold.otf">
                    <font-triplet name="Helvetica" style="normal" weight="bold" />
                    <font-triplet name="Calibri" style="normal" weight="bold" />
                    <font-triplet name="sans-serif" style="normal" weight="bold"/>
                    <font-triplet name="SansSerif" style="normal" weight="bold"/>
                </font>
                <font embed-url="NimbusSans-Italic.otf">
                    <font-triplet name="Helvetica" style="italic" weight="normal" />
                    <font-triplet name="Calibri" style="italic" weight="normal" />
                    <font-triplet name="sans-serif" style="italic" weight="normal"/>
                    <font-triplet name="SansSerif" style="italic" weight="normal"/>
                </font>
                <font embed-url="NimbusSans-BoldItalic.otf">
                    <font-triplet name="Helvetica" style="italic" weight="bold" />
                    <font-triplet name="Calibri" style="italic" weight="bold" />
                    <font-triplet name="sans-serif" style="italic" weight="bold"/>
                    <font-triplet name="SansSerif" style="italic" weight="bold"/>
                </font>
    
                <font embed-url="NimbusRoman-Regular.otf">
                    <font-triplet name="Times" style="normal" weight="normal" />
                    <font-triplet name="Times New Roman" style="normal" weight="normal" />
                    <font-triplet name="serif" style="normal" weight="normal"/>
                    <font-triplet name="any" style="normal" weight="normal"/>
                </font>
                <font embed-url="NimbusRoman-Bold.otf">
                    <font-triplet name="Times" style="normal" weight="bold" />
                    <font-triplet name="Times New Roman" style="normal" weight="bold" />
                    <font-triplet name="serif" style="normal" weight="bold"/>
                    <font-triplet name="any" style="normal" weight="bold"/>
                </font>
                <font embed-url="NimbusRoman-Italic.otf">
                    <font-triplet name="Times" style="italic" weight="normal" />
                    <font-triplet name="Times New Roman" style="italic" weight="normal" />
                    <font-triplet name="serif" style="italic" weight="normal"/>
                    <font-triplet name="any" style="italic" weight="normal"/>
                </font>
                <font embed-url="NimbusRoman-BoldItalic.otf">
                    <font-triplet name="Times" style="italic" weight="bold" />
                    <font-triplet name="Times New Roman" style="italic" weight="bold" />
                    <font-triplet name="serif" style="italic" weight="bold"/>
                    <font-triplet name="any" style="italic" weight="bold"/>
                </font>
    
                <font embed-url="NimbusMonoPS-Regular.otf">
                    <font-triplet name="Courier" style="normal" weight="normal" />
                    <font-triplet name="Courier New" style="normal" weight="normal" />
                    <font-triplet name="monospace" style="normal" weight="normal"/>
                </font>
                <font embed-url="NimbusMonoPS-Italic.otf">
                    <font-triplet name="Courier" style="normal" weight="bold" />
                    <font-triplet name="Courier New" style="normal" weight="bold" />
                    <font-triplet name="monospace" style="normal" weight="bold"/>
                </font>
                <font embed-url="NimbusMonoPS-Bold.otf">
                    <font-triplet name="Courier" style="italic" weight="normal" />
                    <font-triplet name="Courier New" style="italic" weight="normal" />
                    <font-triplet name="monospace" style="italic" weight="normal"/>
                </font>
                <font embed-url="NimbusMonoPS-BoldItalic.otf">
                    <font-triplet name="Courier" style="italic" weight="bold" />
                    <font-triplet name="Courier New" style="italic" weight="bold" />
                    <font-triplet name="monospace" style="italic" weight="bold"/>
                </font>
    
                <font embed-url="StandardSymbolsPS.otf">
                    <font-triplet name="Symbol" style="normal" weight="normal" />
                    <font-triplet name="Symbol" style="normal" weight="bold" />
                </font>
    
                <font embed-url="D050000L.otf">
                    <font-triplet name="ZapfDingbats" style="normal" weight="normal" />
                    <font-triplet name="ZapfDingbats" style="normal" weight="bold" />
                </font>
            </fonts>
        </renderer>
    </renderers>
</fop>

此外,如果有人有兴趣将字体嵌入到 jar/war 存档中,只需将 font-base 元素更改为<font-base>classpath:/fonts/</font-base>并将您的字体文件添加到/src/main/resources/fonts/.


推荐阅读