首页 > 解决方案 > 仅当 XML/HTML 标记在特定标记之外时才附加它。JAVA/JSOUP

问题描述

有两种情况:

  1. 如果<if>标签存在于标签之外,<except>则附加<print>标签并附加</print>标签和相应的</if>标签。

  2. 如果<print>标签已经与<if>标签相关联,则无需再次添加。

输入 XML 是:

<if>
  <except>
    <if>
      <except>
        <if />
      </except>
    </if>
  </except>
</if>

预期的输出应该是:

<if>
  <print>
    <except>
      <if>
        <except>
          <if />
        </except>
      </if>
    </except>
  </print>
</if>

我能做些什么来实现这一目标?

标签: javahtmlparsingjsoup

解决方案


评论中的解释:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.parser.Parser;
import org.jsoup.select.Elements;

public class StackOverflow58484337 {

    public static void main(String[] args) {
        String html = "<if><except><if><except><if /></except></if></except></if>";
        Document doc = Jsoup.parse(html, "", Parser.xmlParser());
        // select every "if" element
        Elements ifs = doc.select("if");
        System.out.println("--- before:");
        System.out.println(doc);
        // check every "if" element if any of its parents is "except" element
        for (Element singleIf : ifs) {
            if (isOutsideExcept(singleIf)) {
                // wrap it in "print" element
                singleIf.children().wrap("<print>");
            }
        }
        System.out.println("--- after:");
        System.out.println(doc);
    }

    private static boolean isOutsideExcept(Element singleIf) {
        Element parent = singleIf.parent();
        // check parent, and parent of his parent, and parent of his parent ...
        while (parent != null) {
            if (parent.tagName().equals("except")) {
                return false;
            }
            parent = parent.parent();
        }
        return true;
    }

}

输出:

--- before:
<if>
 <except>
  <if>
   <except>
    <if />
   </except>
  </if>
 </except>
</if>
--- after:
<if>
 <print>
  <except>
   <if>
    <except>
     <if />
    </except>
   </if>
  </except>
 </print>
</if>

推荐阅读