首页 > 解决方案 > 根据孙子的 id 将 XML 拆分为更小的块

问题描述

我有一个 xml,应该由唯一的 BookId 节点分成更小的块。基本上我需要将每本书过滤成具有与初始 XML 相同结构的单独 xml。

这样做的目的是 - 要求针对 XSD 验证每个较小的 XML 以确定哪个 Book/PendingBook 无效。

请注意,Books节点可以同时包含BookPendingBook节点。

初始 XML:

<Main xmlns="http://some/url/name">
  <Books>

    <Book>
      <IdentifyingInformation>
        <ID>
          <Year>2021</Year>
          <BookId>001</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </Book>

    <Book>
      <IdentifyingInformation>
        <ID>
          <Year>2020</Year>
          <BookId>002</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </Book>

    <PendingBook>
      <IdentifyingInformation>
        <ID>
          <Year>2020</Year>
          <BookId>003</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </PendingBook>

    <OtherInfo>...</OtherInfo>

  </Books>
</Main>

结果应该类似于下一个 xmls:

Book_001.xml (BookId = 001):

<Main xmlns="http://some/url/name">
  <Books>

    <Book>
      <IdentifyingInformation>
        <ID>
          <Year>2021</Year>
          <BookId>001</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </Book>

    <OtherInfo>...</OtherInfo>

  </Books>
</Main>

Book_002.xml (BookId = 002):

<Main xmlns="http://some/url/name">
  <Books>

    <Book>
      <IdentifyingInformation>
        <ID>
          <Year>2020</Year>
          <BookId>002</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </Book>

    <OtherInfo>...</OtherInfo>

  </Books>
</Main>

PendingBook_003.xml (BookId = 003):

<Main xmlns="http://some/url/name">
  <Books>

    <PendingBook>
      <IdentifyingInformation>
        <ID>
          <Year>2021</Year>
          <BookId>003</BookId>
          <BookDateTime>2021-05-10T12:35:00</BookDateTime>
        </ID>
      </IdentifyingInformation>
    </PendingBook>

    <OtherInfo>...</OtherInfo>

  </Books>
</Main>

到目前为止,我只将每个ID节点提取到较小的 xml 中。并手动创建根元素。

理想情况下,我想从初始 xml 复制所有元素并放入 Books 节点单个 Book/PendingBook 节点。

我的java示例:

package com.main;

import java.io.File;
import java.util.ArrayList;
import java.util.List;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class ExtractXmls {
    /**
     * @param args
     */
    public static void main(String[] args) throws Exception
    {
        String inputFile = "C:/pathToXML/Main.xml";

        File xmlFile = new File(inputFile);
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        Document doc = dBuilder.parse(xmlFile);

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(true); // never forget this!

        XPathFactory xfactory = XPathFactory.newInstance();
        XPath xpath = xfactory.newXPath();
        XPathExpression allBookIdsExpression = xpath.compile("//Books/*/IdentifyingInformation/ID/BookId/text()");
        NodeList bookIdNodes = (NodeList) allBookIdsExpression.evaluate(doc, XPathConstants.NODESET);

        //Save all the products
        List<String> bookIds = new ArrayList<>();
        for (int i = 0; i < bookIdNodes.getLength(); ++i) {
            Node bookId = bookIdNodes.item(i);

            System.out.println(bookId.getTextContent());
            bookIds.add(bookId.getTextContent());
        }

        //Now we create and save split XMLs
        for (String bookId : bookIds)
        {
            //With such query I can find node based on bookId
            String xpathQuery = "//ID[BookId='" + bookId + "']";
            xpath = xfactory.newXPath();
            XPathExpression query = xpath.compile(xpathQuery);
            NodeList bookIdNodesFiltered = (NodeList) query.evaluate(doc, XPathConstants.NODESET);
            System.out.println("Found " + bookIdNodesFiltered.getLength() + " bookId(s) for bookId " + bookId);

            //We store the new XML file in bookId.xml e.g. 001.xml
            Document aamcIdXml = dBuilder.newDocument();
            Element root = aamcIdXml.createElement("Main"); //Here I'm recreating root element (don't know if I can avoid it and copy somehow structure of initial xml)
            aamcIdXml.appendChild(root);
            for (int i = 0; i < bookIdNodesFiltered.getLength(); i++) {
                Node node = bookIdNodesFiltered.item(i);
                Node copyNode = aamcIdXml.importNode(node, true);
                root.appendChild(copyNode);
            }


            //At the end, we save the file XML on disk
            TransformerFactory transformerFactory = TransformerFactory.newInstance();
            Transformer transformer = transformerFactory.newTransformer();
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            DOMSource source = new DOMSource(aamcIdXml);

            StreamResult result =  new StreamResult(new File("C:/pathToXML/" + bookId.trim() + ".xml"));
            transformer.transform(source, result);

            System.out.println("Done for " + bookId);
        }
    }

}

标签: javaxml

解决方案


你几乎让它工作了。您可以在循环中更改您的 XPath,迭代书籍 ID 以获取BookorPendingBook元素,然后使用它。除了新创建的元素之外,您还需要创建Books元素Main并将其附加Book或附加PendingBook到新创建的Books元素。

XPath 是://ancestor::*[IdentifyingInformation/ID/BookId=bookId]

它获取 bookId 与当前迭代中的 ID 匹配的元素的祖先,即BookorPendingBook元素。

//Now we create and save split XMLs
        for (String bookId : bookIds)
        {
            //With such query I can find node based on bookId
            String xpathQuery = "//ancestor::*[IdentifyingInformation/ID/BookId=" + bookId + "]";
            xpath = xfactory.newXPath();
            XPathExpression query = xpath.compile(xpathQuery);
            NodeList bookIdNodesFiltered = (NodeList) query.evaluate(doc, XPathConstants.NODESET);
            System.out.println("Found " + bookIdNodesFiltered.getLength() + " bookId(s) for bookId " + bookId);

            //We store the new XML file in bookId.xml e.g. 001.xml
            Document aamcIdXml = dBuilder.newDocument();
            Element root = aamcIdXml.createElement("Main");
            Element booksNode = aamcIdXml.createElement("Books");
            root.appendChild(booksNode);
            //Here I'm recreating root element (don't know if I can avoid it and copy somehow structure of initial xml)
            aamcIdXml.appendChild(root);
            String bookName = "";
            for (int i = 0; i < bookIdNodesFiltered.getLength(); i++) {
                Node node = bookIdNodesFiltered.item(i);
                Node copyNode = aamcIdXml.importNode(node, true);
                bookName = copyNode.getNodeName();
                booksNode.appendChild(copyNode);
            }


            //At the end, we save the file XML on disk
            TransformerFactory transformerFactory = TransformerFactory.newInstance();
            Transformer transformer = transformerFactory.newTransformer();
            transformer.setOutputProperty(OutputKeys.INDENT, "yes");
            DOMSource source = new DOMSource(aamcIdXml);

            StreamResult result =  new StreamResult(new File(bookName + "_" + bookId.trim() + ".xml"));
            transformer.transform(source, result);

            System.out.println("Done for " + bookId);
        }

而且我还修改了代码以根据需要命名文件,例如Book_001.xml.


推荐阅读