首页 > 解决方案 > 无法从网页获取 XML 到 Java 中的字符串

问题描述

我从这个网页获取 XML 时遇到问题。在浏览器中它显示正确并且没有问题,但是当涉及到 Java 时,情况就不同了。

我尝试了两种方法,它们都导致异常。

// Method 1 - Using Java's URL
URL url = new URL(/* mentioned link */);
String rawXML = new String(url.openStream().readAllBytes(), StandardCharsets.UTF_8); // java.io.IOException: Invalid Http response
// Method 2 - Using Apache's HTTP client
HttpGet httpGet = new HttpGet(/* mentioned link */);
String rawXML = EntityUtils.toString(HttpClients.createDefault().execute(httpGet).getEntity()); // org.apache.http.ProtocolException: The server failed to respond with a valid HTTP response

wget使用和使用参数下载此网页--content-on-error是可行的,但它不可靠,因为 wget 并非总是在 Windows 等所有系统上可用。

标签: javaxmlhttp

解决方案


响应不包含标头,因此 java 拒绝它

wget "https://www.strava.cz/foxisapi/foxisapi.dll/istravne.istravne.process?xmljidelnickyA&zarizeni=3148" -O so-69226464.html
--2021-09-17 13:44:29--  https://www.strava.cz/foxisapi/foxisapi.dll/istravne.istravne.process?xmljidelnickyA&zarizeni=3148
Resolving www.strava.cz (www.strava.cz)... 82.99.180.77
Connecting to www.strava.cz (www.strava.cz)|82.99.180.77|:443... connected.
HTTP request sent, awaiting response... 200 No headers, assuming HTTP/0.9
Length: unspecified

这个发出原始 HTTP GET 请求的 java 类能够获取内容。基于页面。
发送的请求是

GET /foxisapi/foxisapi.dll/istravne.istravne.process?xmljidelnickyA&zarizeni=3148 HTTP/1.1\r\n
User-Agent: RawHttpGet\r\n
Host: www.strava.cz\r\n
Accept: */*\r\n

Java代码:

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.net.Socket;
import java.nio.charset.StandardCharsets;

import javax.net.ssl.SSLSocketFactory;

public class RawHttpGet {
    private static String hostname = "www.strava.cz";
    public static void main(String[] args) throws IOException {
        Socket socket = SSLSocketFactory.getDefault().createSocket(hostname, 443);

        // UTF-8 encdoding
        //BufferedWriter out = new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), StandardCharsets.UTF_8));
        // Encoding for this request
        BufferedWriter out = new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), "Cp1250"));
        BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
        
        StringBuffer buff = new StringBuffer("GET /foxisapi/foxisapi.dll/istravne.istravne.process?xmljidelnickyA&zarizeni=3148 HTTP/1.1\r\n");
        buff.append("User-Agent: RawHttpGet\r\n");
        buff.append("Accept: */*\r\n");
        buff.append("Host: " + hostname + "\r\n");
        buff.append("\r\n");
        System.out.println(" * Request");
        System.out.println(buff.toString());
        // send message
        out.write(buff.toString());
        out.flush();

        // read response
        System.out.println(" * Response");
        // Default system encoding
        //System.out.println(new String(socket.getInputStream().readAllBytes()));
        // Encoding for this request
        System.out.println(new String(socket.getInputStream().readAllBytes(), "Cp1250"));

        out.close();
        in.close();
    }
}

推荐阅读