首页 > 解决方案 > Apache POI 在替换 doc 文件中的字符串后破坏了目录

问题描述

我必须替换 *.doc 文件中的一些字符串。(我知道使用 *.docx 会更容易)当我做不止一个替换目录损坏时。有没有办法保存目录?

我有两段代码,它们似乎创建了相同的输出。

更快的代码:

Map<String, String> items = new HashMap<>();
items.put("toreplace1", "replacement1");
items.put("toreplace2", "replacement2");
try (POIFSFileSystem fs = new POIFSFileSystem(new FileInputStream("c:\\doc\\mydocument.doc")); HWPFDocument doc = new HWPFDocument(fs);) {
    Range r1 = doc.getRange();
    items.forEach((k, v) -> {
        for (int i = 0; i < r1.numSections(); ++i) {
            Section s = r1.getSection(i);
            for (int x = 0; x < s.numParagraphs(); x++) {
                Paragraph p = s.getParagraph(x);
                for (int z = 0; z < p.numCharacterRuns(); z++) {
                    CharacterRun run = p.getCharacterRun(z);
                    String text = run.text();
                    if (text.contains(k)) {
                        run.replaceText(k, v);
                    }
                }
            }
        }
    });
    doc.write(new FileOutputStream(new File("c:\\doc\\mydocument_replaced.doc")));
}

较慢的代码:

Map<String, String> items = new HashMap<>();
items.put("toreplace1", "replacement1");
items.put("toreplace2", "replacement2");
try (HWPFDocument doc = new HWPFDocument(new FileInputStream(new File("c:\\doc\\mydocument.doc")))) {
    Range range = doc.getRange();
    items.forEach((k, v) -> {
        range.replaceText(k, v);
    });
    doc.write(new FileOutputStream(new File("c:\\doc\\mydocument_replaced.doc")));
}

标签: apache-poidoctableofcontents

解决方案


为了避免损坏Word字段(TOC也是字段),可以将替换限制为仅记录字段外的部分。

字段始终以包含字节的运行开始,并以包含字节0x13的运行结束0x15。因此,以下代码应将替换限制为仅记录字段外的部分。使用此代码TOC应避免损坏。但是当然TOCthen 不会是最新的,必须手动更新(Ctrl+ A, then F9)。

import java.io.*;

import org.apache.poi.hwpf.*;
import org.apache.poi.hwpf.usermodel.*;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;

import java.util.Map;
import java.util.HashMap;

public class WordReplaceTextInHWPFRuns {

 public static void main(String[] args) throws Exception {

  Map<String, String> items = new HashMap<>();
  items.put("toreplace1", "replacement1");
  items.put("toreplace2", "replacement2");

  boolean insideField = false;

  try (HWPFDocument doc = new HWPFDocument(new FileInputStream(new File("mydocument.doc")))) {

   Range range = doc.getRange();
   for (String k : items.keySet()) {
    String v = items.get(k);
    for (int r = 0; r < range.numCharacterRuns(); r++) {
     CharacterRun run = range.getCharacterRun(r);
     String text = run.text();

System.out.println(text);

     if (text.indexOf('\u0013') > -1) insideField = true;
     if (text.indexOf('\u0015') > -1) insideField = false;
     if (text.contains(k) && !insideField) {
      run.replaceText(k, v);

System.out.println("===========REPLACED=============");
System.out.println(run.text());

     }
    }
   }

   doc.write(new FileOutputStream(new File("mydocument_replaced.doc")));

  }
 }
}

推荐阅读