首页 > 解决方案 > 用于 apche poi docx 的 HTML 解析器,用于将 html 插入段落

问题描述

我正在尝试使用 apache poi 将 html 插入 docx。Jsoup 非常适合解析 html,这个答案对我有很大帮助,但我坚持将 UL 和 LI 插入 docx 作为在新位置插入段落 cozing 问题

帮助我的问题:如何设置为同一段落定义不同的样式

我添加的 ULparser:

public class UnorderedListParser implements NodeVisitor {

      String nodeName;
      boolean needNewRun;
      boolean isItalic;
      boolean isBold;
      boolean isUnderlined;
      int fontSize;
      boolean insertImage = false;
      Node anchorNode= null;
      boolean liStarted = false;
      String fontColor;
      final CSSOMParser parser = new CSSOMParser();

      XWPFParagraph paragraph;
      XWPFRun run;
     
      List<String> textList = new ArrayList<String>();
      UnorderedListParser(XWPFParagraph paragraph) {
       this.paragraph = paragraph;
       this.run = paragraph.createRun();
       this.nodeName = "";
       this.needNewRun = false;
       this.isItalic = false;
       this.isBold = false;
       this.isUnderlined = false;
       this.fontSize = 11;
       this.fontColor = "000000";
       this.insertImage = false;
       
      }

      @Override
      public void head(Node node, int depth) {
          nodeName = node.nodeName();
          System.out.println("Start1 "+nodeName+": " + node);
          if("li".equals(nodeName)) {
              liStarted = true;
          }
          if ("#text".equals(nodeName)) {
              if(liStarted) {
                
                  textList.add(((TextNode)node).text());
              }
          }
        
       
      }

      @Override
      public void tail(Node node, int depth) {
       nodeName = node.nodeName();

            System.out.println("End1 "+nodeName);
            if("li".equals(nodeName)) {
                  liStarted = false;
              }
            if("ul".equals(nodeName)) {
                try {
                    System.out.println("gpging into create buleet list");
                    createBulletList(paragraph ,run, textList);
                    run = paragraph.createRun();
                }catch(Exception e) {
                    System.out.println("into exception");
                }
            }
      }
      
      
      public static  Map<String, String> getStyleMap(Node element) {
            Map<String, String> keymaps = new HashMap<>();
            if (!element.hasAttr("style")) {
                return keymaps;
            }
            String styleStr = element.attr("style"); // => margin-top:-80px !important;color:#fcc;border-bottom:1px solid #ccc; background-color: #333; text-align:center
            String[] keys = styleStr.split(":");
            String[] split;
            if (keys.length > 1) {
                for (int i = 0; i < keys.length; i++) {
                    if (i % 2 != 0) {
                        split = keys[i].split(";");
                        if (split.length == 1) break;
                        keymaps.put(split[1].trim(), keys[i + 1].split(";")[0].trim());
                    } else {
                        split = keys[i].split(";");
                        if (i + 1 == keys.length) break;
                        keymaps.put(keys[i].split(";")[split.length - 1].trim(), keys[i + 1].split(";")[0].trim());
                    }
                }
            }
            return keymaps;
        }
      
      public static void createBulletList(XWPFParagraph paragraph ,XWPFRun run,  List<String> documentList) {
          System.out.println("all good");
          CTAbstractNum cTAbstractNum = CTAbstractNum.Factory.newInstance();
          //Next we set the AbstractNumId. This requires care.
          //Since we are in a new document we can start numbering from 0.
          //But if we have an existing document, we must determine the next free number first.
          cTAbstractNum.setAbstractNumId(BigInteger.valueOf(0));
    
          //Bullet list
          CTLvl cTLvl = cTAbstractNum.addNewLvl();
          cTLvl.addNewNumFmt().setVal(STNumberFormat.BULLET);
          cTLvl.addNewSuff().setVal(STLevelSuffix.SPACE);
          cTLvl.addNewLvlText().setVal("•&quot;);
    
          XWPFAbstractNum abstractNum = new XWPFAbstractNum(cTAbstractNum);
          XWPFNumbering numbering = paragraph.getDocument().createNumbering();
    
          BigInteger abstractNumID = numbering.addAbstractNum(abstractNum);
          BigInteger numID = numbering.addNum(abstractNumID);
         
          
          XmlCursor  cursor;
          for (String string : documentList) {
              
              paragraph.setNumID(numID);
               // font size for bullet point in half pt
              paragraph.getCTP().getPPr().addNewRPr().addNewSz().setVal(BigInteger.valueOf(22));
               run = paragraph.createRun();
               run.setText(string);
               run.setFontSize(11);
               cursor = paragraph.getCTP().newCursor();
              cursor.toEndToken();
              while(cursor.toNextToken() != org.apache.xmlbeans.XmlCursor.TokenType.START);

              paragraph =paragraph.getDocument().insertNewParagraph(cursor);
          }
          System.out.println("all good1");
          cursor = paragraph.getCTP().newCursor();
          cursor.toEndToken();
          while(cursor.toNextToken() != org.apache.xmlbeans.XmlCursor.TokenType.START);

          paragraph =paragraph.getDocument().insertNewParagraph(cursor);

      }
      
      public XWPFParagraph returnParagraph() {
          return paragraph;
          
      }

}

这就是它所呈现的

我正在使用的HTML:

<p><a href="" target="_blank">dda</a></p><p><br></p><ul><li>1</li><li>2</li><li>3</li></ul><p>CRM stands for<span style="color: rgb(0, 0, 0);"> “custom</span><span style="color: rgb(57, 92, 92);">er relationship management” and it’s software that stores customer contact information like names, addresses, and phone numbers, as well as keeps track of customer activity like website visits, pho</span><span style="color: rgb(0, 0, 0);">ne calls, email, and more.</span></p><p><span style="color: rgb(0, 0, 0);">Discover Customer 360, the world’s #1 CR</span>M. Connect to your customers in a more intelligent way by uniting sales, service, marketing, commerce, IT, and analytics. All powered by our global community of Trailblazers.</p><p><br></p><p><br></p><p><img src="https://test.com" alt="Porter charge to drop laptop from home2.jpg"></img></p>

标签: javaapache-poijsoupdocx

解决方案


推荐阅读