首页 > 解决方案 > 用 Java 读取 XML API 响应

问题描述

我想阅读下面的 XML 响应,但它给出了一个错误。

<html>
<head>
    <title>OK</title>
</head>
    <body>
    <h1>OK</h1>
    <table>
        <tbody>
            <tr>
                <td>Status</td>
                <td><div id="Status">200</div></td>
            </tr>
            <tr>
                <td>Message</td>
                <td><div id="Message">Page created</div></td>
            </tr>
            <tr>
                <td>Location</td>
                <td><a href="/content/parentnode/demopage" id="Location">/content/parentnode/demopage</a></td>
            </tr>
            <tr>
                <td>Parent Location</td>
                <td><a href="/content/parentnode" id="ParentLocation">/content/parentnode</a></td>
            </tr>
            <tr>
                <td>Path</td>
                <td><div id="Path">/content/parentnode/demopage</div></td>
            </tr>
            <tr>
                <td>Referer</td>
                <td><a href="" id="Referer"></a></td>
            </tr>
            <tr>
                <td>ChangeLog</td>
                <td><div id="ChangeLog">&lt;pre&gt;&lt;/pre&gt;</div></td>
            </tr>
        </tbody>
    </table>
    <p><a href="">Go Back</a></p>
    <p><a href="/content/parentnode/demopage">Modified Resource</a></p>
    <p><a href="/content/parentnode">Parent of Modified Resource</a></p>
    </body>
</html>

我正在尝试使用以下代码阅读“页面已创建”消息

Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder()
                .parse(new InputSource(new StringReader(response.toString())));

        NodeList nodes = doc.getElementsByTagName("div");
        if (nodes.getLength() > 0) {
            Element ele = (Element) nodes.item(0);
            System.out.println("Page created -"
                    + ele.getElementsByTagName("//div[contains(@id,'Message')]").item(0).getTextContent());
        } else {    
        }

[Fatal Error] :1:1: Content is not allowed in prolog.
Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:262)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
    at working.OkhttpCreatePage.main(OkhttpCreatePage.java:40)

第 40 行是.parse(new InputSource(new StringReader(response.toString())));

我究竟做错了什么?

标签: javaxmlxml-parsing

解决方案


您正在解析的 HTML 代码可以由 Java DOM 解析器解析,但这可能是巧合:另一个 HTML 响应可能包含一些标记,从 XML 的角度来看这将是无效的。如果您 100% 确定,响应将以 XML/XHTML 格式出现,那应该不是问题,否则切换到 JSoup 解析器是有意义的,正如另一个答案中所建议的那样。

至于Content is not allowed in prolog错误,它可能来自实际 XML 文档开始之前的空格或其他字符。您可以在解析字符串之前尝试修剪字符串,或者从第一个<字符到结尾对其进行子串化。

另请注意,您的 XPath 逻辑有点不正确。这里是修正版:

Document doc = DocumentBuilderFactory.newInstance()
            .newDocumentBuilder()
            .parse(new InputSource(new StringReader(xml)));

    NodeList nodes = doc.getElementsByTagName("div");
    if (nodes.getLength() > 0) {
        Element ele = (Element) nodes.item(0);
        System.out.println("Page created - "
                + XPathFactory.newInstance().newXPath().evaluate("//div[contains(@id,'Message')]", ele));
    }

推荐阅读