java - 根据孙子的 id 将 XML 拆分为更小的块
问题描述
我有一个 xml,应该由唯一的 BookId 节点分成更小的块。基本上我需要将每本书过滤成具有与初始 XML 相同结构的单独 xml。
这样做的目的是 - 要求针对 XSD 验证每个较小的 XML 以确定哪个 Book/PendingBook 无效。
请注意,Books节点可以同时包含Book和PendingBook节点。
初始 XML:
<Main xmlns="http://some/url/name">
<Books>
<Book>
<IdentifyingInformation>
<ID>
<Year>2021</Year>
<BookId>001</BookId>
<BookDateTime>2021-05-10T12:35:00</BookDateTime>
</ID>
</IdentifyingInformation>
</Book>
<Book>
<IdentifyingInformation>
<ID>
<Year>2020</Year>
<BookId>002</BookId>
<BookDateTime>2021-05-10T12:35:00</BookDateTime>
</ID>
</IdentifyingInformation>
</Book>
<PendingBook>
<IdentifyingInformation>
<ID>
<Year>2020</Year>
<BookId>003</BookId>
<BookDateTime>2021-05-10T12:35:00</BookDateTime>
</ID>
</IdentifyingInformation>
</PendingBook>
<OtherInfo>...</OtherInfo>
</Books>
</Main>
结果应该类似于下一个 xmls:
Book_001.xml (BookId = 001):
<Main xmlns="http://some/url/name">
<Books>
<Book>
<IdentifyingInformation>
<ID>
<Year>2021</Year>
<BookId>001</BookId>
<BookDateTime>2021-05-10T12:35:00</BookDateTime>
</ID>
</IdentifyingInformation>
</Book>
<OtherInfo>...</OtherInfo>
</Books>
</Main>
Book_002.xml (BookId = 002):
<Main xmlns="http://some/url/name">
<Books>
<Book>
<IdentifyingInformation>
<ID>
<Year>2020</Year>
<BookId>002</BookId>
<BookDateTime>2021-05-10T12:35:00</BookDateTime>
</ID>
</IdentifyingInformation>
</Book>
<OtherInfo>...</OtherInfo>
</Books>
</Main>
PendingBook_003.xml (BookId = 003):
<Main xmlns="http://some/url/name">
<Books>
<PendingBook>
<IdentifyingInformation>
<ID>
<Year>2021</Year>
<BookId>003</BookId>
<BookDateTime>2021-05-10T12:35:00</BookDateTime>
</ID>
</IdentifyingInformation>
</PendingBook>
<OtherInfo>...</OtherInfo>
</Books>
</Main>
到目前为止,我只将每个ID节点提取到较小的 xml 中。并手动创建根元素。
理想情况下,我想从初始 xml 复制所有元素并放入 Books 节点单个 Book/PendingBook 节点。
我的java示例:
package com.main;
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class ExtractXmls {
/**
* @param args
*/
public static void main(String[] args) throws Exception
{
String inputFile = "C:/pathToXML/Main.xml";
File xmlFile = new File(inputFile);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile);
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true); // never forget this!
XPathFactory xfactory = XPathFactory.newInstance();
XPath xpath = xfactory.newXPath();
XPathExpression allBookIdsExpression = xpath.compile("//Books/*/IdentifyingInformation/ID/BookId/text()");
NodeList bookIdNodes = (NodeList) allBookIdsExpression.evaluate(doc, XPathConstants.NODESET);
//Save all the products
List<String> bookIds = new ArrayList<>();
for (int i = 0; i < bookIdNodes.getLength(); ++i) {
Node bookId = bookIdNodes.item(i);
System.out.println(bookId.getTextContent());
bookIds.add(bookId.getTextContent());
}
//Now we create and save split XMLs
for (String bookId : bookIds)
{
//With such query I can find node based on bookId
String xpathQuery = "//ID[BookId='" + bookId + "']";
xpath = xfactory.newXPath();
XPathExpression query = xpath.compile(xpathQuery);
NodeList bookIdNodesFiltered = (NodeList) query.evaluate(doc, XPathConstants.NODESET);
System.out.println("Found " + bookIdNodesFiltered.getLength() + " bookId(s) for bookId " + bookId);
//We store the new XML file in bookId.xml e.g. 001.xml
Document aamcIdXml = dBuilder.newDocument();
Element root = aamcIdXml.createElement("Main"); //Here I'm recreating root element (don't know if I can avoid it and copy somehow structure of initial xml)
aamcIdXml.appendChild(root);
for (int i = 0; i < bookIdNodesFiltered.getLength(); i++) {
Node node = bookIdNodesFiltered.item(i);
Node copyNode = aamcIdXml.importNode(node, true);
root.appendChild(copyNode);
}
//At the end, we save the file XML on disk
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
DOMSource source = new DOMSource(aamcIdXml);
StreamResult result = new StreamResult(new File("C:/pathToXML/" + bookId.trim() + ".xml"));
transformer.transform(source, result);
System.out.println("Done for " + bookId);
}
}
}
解决方案
你几乎让它工作了。您可以在循环中更改您的 XPath,迭代书籍 ID 以获取Book
orPendingBook
元素,然后使用它。除了新创建的元素之外,您还需要创建Books
元素Main
并将其附加Book
或附加PendingBook
到新创建的Books
元素。
XPath 是://ancestor::*[IdentifyingInformation/ID/BookId=bookId]
它获取 bookId 与当前迭代中的 ID 匹配的元素的祖先,即Book
orPendingBook
元素。
//Now we create and save split XMLs
for (String bookId : bookIds)
{
//With such query I can find node based on bookId
String xpathQuery = "//ancestor::*[IdentifyingInformation/ID/BookId=" + bookId + "]";
xpath = xfactory.newXPath();
XPathExpression query = xpath.compile(xpathQuery);
NodeList bookIdNodesFiltered = (NodeList) query.evaluate(doc, XPathConstants.NODESET);
System.out.println("Found " + bookIdNodesFiltered.getLength() + " bookId(s) for bookId " + bookId);
//We store the new XML file in bookId.xml e.g. 001.xml
Document aamcIdXml = dBuilder.newDocument();
Element root = aamcIdXml.createElement("Main");
Element booksNode = aamcIdXml.createElement("Books");
root.appendChild(booksNode);
//Here I'm recreating root element (don't know if I can avoid it and copy somehow structure of initial xml)
aamcIdXml.appendChild(root);
String bookName = "";
for (int i = 0; i < bookIdNodesFiltered.getLength(); i++) {
Node node = bookIdNodesFiltered.item(i);
Node copyNode = aamcIdXml.importNode(node, true);
bookName = copyNode.getNodeName();
booksNode.appendChild(copyNode);
}
//At the end, we save the file XML on disk
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
DOMSource source = new DOMSource(aamcIdXml);
StreamResult result = new StreamResult(new File(bookName + "_" + bookId.trim() + ".xml"));
transformer.transform(source, result);
System.out.println("Done for " + bookId);
}
而且我还修改了代码以根据需要命名文件,例如Book_001.xml
.
推荐阅读
- linux - 无法完成与 2404:6800:4005:807::200e:80 的 SOCKS5 连接
- python - 如果你做太多'layers.MaxPooling2D()'会发生什么
- azure - 如何在 power shell 中使用 RM 模块获取 Azure App 配置访问密钥
- sails.js - 如何在 Sails.js 中将我们自己的自定义 JSON 消息添加到 Blueprint API?
- python - 结合CNN和双向LSTM
- r - Importing a csv file using fread loses factor order
- regex - Ansible 替换大配置文件中的值
- javascript - NodeJS Var 未定义自身?
- c++ - 使用贪心算法解决方案的汽车加油问题给出了错误的输出
- ssl - 如何在加特林中关闭 TLSv1.3?