java - 无法从网页获取 XML 到 Java 中的字符串
问题描述
我从这个网页获取 XML 时遇到问题。在浏览器中它显示正确并且没有问题,但是当涉及到 Java 时,情况就不同了。
我尝试了两种方法,它们都导致异常。
// Method 1 - Using Java's URL
URL url = new URL(/* mentioned link */);
String rawXML = new String(url.openStream().readAllBytes(), StandardCharsets.UTF_8); // java.io.IOException: Invalid Http response
// Method 2 - Using Apache's HTTP client
HttpGet httpGet = new HttpGet(/* mentioned link */);
String rawXML = EntityUtils.toString(HttpClients.createDefault().execute(httpGet).getEntity()); // org.apache.http.ProtocolException: The server failed to respond with a valid HTTP response
wget
使用和使用参数下载此网页--content-on-error
是可行的,但它不可靠,因为 wget 并非总是在 Windows 等所有系统上可用。
解决方案
响应不包含标头,因此 java 拒绝它
wget "https://www.strava.cz/foxisapi/foxisapi.dll/istravne.istravne.process?xmljidelnickyA&zarizeni=3148" -O so-69226464.html
--2021-09-17 13:44:29-- https://www.strava.cz/foxisapi/foxisapi.dll/istravne.istravne.process?xmljidelnickyA&zarizeni=3148
Resolving www.strava.cz (www.strava.cz)... 82.99.180.77
Connecting to www.strava.cz (www.strava.cz)|82.99.180.77|:443... connected.
HTTP request sent, awaiting response... 200 No headers, assuming HTTP/0.9
Length: unspecified
这个发出原始 HTTP GET 请求的 java 类能够获取内容。基于此页面。
发送的请求是
GET /foxisapi/foxisapi.dll/istravne.istravne.process?xmljidelnickyA&zarizeni=3148 HTTP/1.1\r\n
User-Agent: RawHttpGet\r\n
Host: www.strava.cz\r\n
Accept: */*\r\n
Java代码:
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.net.Socket;
import java.nio.charset.StandardCharsets;
import javax.net.ssl.SSLSocketFactory;
public class RawHttpGet {
private static String hostname = "www.strava.cz";
public static void main(String[] args) throws IOException {
Socket socket = SSLSocketFactory.getDefault().createSocket(hostname, 443);
// UTF-8 encdoding
//BufferedWriter out = new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), StandardCharsets.UTF_8));
// Encoding for this request
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), "Cp1250"));
BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
StringBuffer buff = new StringBuffer("GET /foxisapi/foxisapi.dll/istravne.istravne.process?xmljidelnickyA&zarizeni=3148 HTTP/1.1\r\n");
buff.append("User-Agent: RawHttpGet\r\n");
buff.append("Accept: */*\r\n");
buff.append("Host: " + hostname + "\r\n");
buff.append("\r\n");
System.out.println(" * Request");
System.out.println(buff.toString());
// send message
out.write(buff.toString());
out.flush();
// read response
System.out.println(" * Response");
// Default system encoding
//System.out.println(new String(socket.getInputStream().readAllBytes()));
// Encoding for this request
System.out.println(new String(socket.getInputStream().readAllBytes(), "Cp1250"));
out.close();
in.close();
}
}
推荐阅读
- vue.js - Vue Router 使用 router-link 在两个组件之间传递数据
- javascript - 如何超时运行node.js?
- python - 如何有效地将条件应用于 numpy 数组的索引?
- shopware - Shopware 从 6.2.2 更新到 6.2.3 后出错
- django - django.db.utils.ProgrammingError:关系“django_site”不存在
- express - Paypal 集成,是否可以通过单笔交易获得多个商店和多个员工的付款?
- c# - 无法使用 Newtonsoft.Json 将 JSON 反序列化为实体
- javascript - 允许 Chrome 扩展访问 RESTAPI
- clickhouse - 如果我在clickhouse中只有设备的在线和离线事件记录,如何汇总每小时的在线计数
- javascript - 虚拟键盘事件不会在 Web 上触发