java - 用于 apche poi docx 的 HTML 解析器,用于将 html 插入段落
问题描述
我正在尝试使用 apache poi 将 html 插入 docx。Jsoup 非常适合解析 html,这个答案对我有很大帮助,但我坚持将 UL 和 LI 插入 docx 作为在新位置插入段落 cozing 问题
帮助我的问题:如何设置为同一段落定义不同的样式
我添加的 ULparser:
public class UnorderedListParser implements NodeVisitor {
String nodeName;
boolean needNewRun;
boolean isItalic;
boolean isBold;
boolean isUnderlined;
int fontSize;
boolean insertImage = false;
Node anchorNode= null;
boolean liStarted = false;
String fontColor;
final CSSOMParser parser = new CSSOMParser();
XWPFParagraph paragraph;
XWPFRun run;
List<String> textList = new ArrayList<String>();
UnorderedListParser(XWPFParagraph paragraph) {
this.paragraph = paragraph;
this.run = paragraph.createRun();
this.nodeName = "";
this.needNewRun = false;
this.isItalic = false;
this.isBold = false;
this.isUnderlined = false;
this.fontSize = 11;
this.fontColor = "000000";
this.insertImage = false;
}
@Override
public void head(Node node, int depth) {
nodeName = node.nodeName();
System.out.println("Start1 "+nodeName+": " + node);
if("li".equals(nodeName)) {
liStarted = true;
}
if ("#text".equals(nodeName)) {
if(liStarted) {
textList.add(((TextNode)node).text());
}
}
}
@Override
public void tail(Node node, int depth) {
nodeName = node.nodeName();
System.out.println("End1 "+nodeName);
if("li".equals(nodeName)) {
liStarted = false;
}
if("ul".equals(nodeName)) {
try {
System.out.println("gpging into create buleet list");
createBulletList(paragraph ,run, textList);
run = paragraph.createRun();
}catch(Exception e) {
System.out.println("into exception");
}
}
}
public static Map<String, String> getStyleMap(Node element) {
Map<String, String> keymaps = new HashMap<>();
if (!element.hasAttr("style")) {
return keymaps;
}
String styleStr = element.attr("style"); // => margin-top:-80px !important;color:#fcc;border-bottom:1px solid #ccc; background-color: #333; text-align:center
String[] keys = styleStr.split(":");
String[] split;
if (keys.length > 1) {
for (int i = 0; i < keys.length; i++) {
if (i % 2 != 0) {
split = keys[i].split(";");
if (split.length == 1) break;
keymaps.put(split[1].trim(), keys[i + 1].split(";")[0].trim());
} else {
split = keys[i].split(";");
if (i + 1 == keys.length) break;
keymaps.put(keys[i].split(";")[split.length - 1].trim(), keys[i + 1].split(";")[0].trim());
}
}
}
return keymaps;
}
public static void createBulletList(XWPFParagraph paragraph ,XWPFRun run, List<String> documentList) {
System.out.println("all good");
CTAbstractNum cTAbstractNum = CTAbstractNum.Factory.newInstance();
//Next we set the AbstractNumId. This requires care.
//Since we are in a new document we can start numbering from 0.
//But if we have an existing document, we must determine the next free number first.
cTAbstractNum.setAbstractNumId(BigInteger.valueOf(0));
//Bullet list
CTLvl cTLvl = cTAbstractNum.addNewLvl();
cTLvl.addNewNumFmt().setVal(STNumberFormat.BULLET);
cTLvl.addNewSuff().setVal(STLevelSuffix.SPACE);
cTLvl.addNewLvlText().setVal("•");
XWPFAbstractNum abstractNum = new XWPFAbstractNum(cTAbstractNum);
XWPFNumbering numbering = paragraph.getDocument().createNumbering();
BigInteger abstractNumID = numbering.addAbstractNum(abstractNum);
BigInteger numID = numbering.addNum(abstractNumID);
XmlCursor cursor;
for (String string : documentList) {
paragraph.setNumID(numID);
// font size for bullet point in half pt
paragraph.getCTP().getPPr().addNewRPr().addNewSz().setVal(BigInteger.valueOf(22));
run = paragraph.createRun();
run.setText(string);
run.setFontSize(11);
cursor = paragraph.getCTP().newCursor();
cursor.toEndToken();
while(cursor.toNextToken() != org.apache.xmlbeans.XmlCursor.TokenType.START);
paragraph =paragraph.getDocument().insertNewParagraph(cursor);
}
System.out.println("all good1");
cursor = paragraph.getCTP().newCursor();
cursor.toEndToken();
while(cursor.toNextToken() != org.apache.xmlbeans.XmlCursor.TokenType.START);
paragraph =paragraph.getDocument().insertNewParagraph(cursor);
}
public XWPFParagraph returnParagraph() {
return paragraph;
}
}
我正在使用的HTML:
<p><a href="" target="_blank">dda</a></p><p><br></p><ul><li>1</li><li>2</li><li>3</li></ul><p>CRM stands for<span style="color: rgb(0, 0, 0);"> “custom</span><span style="color: rgb(57, 92, 92);">er relationship management” and it’s software that stores customer contact information like names, addresses, and phone numbers, as well as keeps track of customer activity like website visits, pho</span><span style="color: rgb(0, 0, 0);">ne calls, email, and more.</span></p><p><span style="color: rgb(0, 0, 0);">Discover Customer 360, the world’s #1 CR</span>M. Connect to your customers in a more intelligent way by uniting sales, service, marketing, commerce, IT, and analytics. All powered by our global community of Trailblazers.</p><p><br></p><p><br></p><p><img src="https://test.com" alt="Porter charge to drop laptop from home2.jpg"></img></p>
解决方案
推荐阅读
- sql - 如果存储过程返回 1,则抛出错误,否则继续
- javascript - 我的 TouchableOpacity 在 react-native 中不起作用
- ocr - 有谁知道 Tesseract - OCR 后处理/拼写检查是如何工作的?
- c# - 根据 ValueMember 从 Combobox 获取索引
- python - 如何使用or-tools在车辆路线问题中仅使部分旅行团返回站点?
- oauth - 如何在“Web 服务”场景中应用 OAuth2?
- php - 将表单中的值添加到函数中
- docker - Windows 10 上的 Docker 如何访问网络驱动器?
- angular - 在 Angular 项目中安装 Bootstrap
- typescript - 为什么 TypeScript 在将具有超出属性的对象分配给变量时不显示错误?