首页 > 解决方案 > 从 Java URL 连接流式传输的部分内容

问题描述

我一直无法从我的 Java 应用程序中获取对https://trainingapply.grants.gov/apply/opportunities/schemas/applicant/PKG00037270/Project.xsd的全部请求。有时,我得到整个身体。其他时候,我得到前 4072 个字符,正文结束。我尝试了以下方法:

//core java
try {
  URLConnection urlConnection = new URL(schemaUrl).openConnection();
  StringBuilder schema = new StringBuilder();
  String line;

  try (BufferedReader br = new BufferedReader(new InputStreamReader(urlConnection.getInputStream(), "UTF-8"))) {
    while ((line = br.readLine()) != null) {
      schema.append(line);
    }
  }

  return schema.toString();
} catch (IOException e) {
  //[error handling]
}
//Apache commons
return IOUtils.toString(new URL(schemaUrl), "UTF-8");

并使用具有相同结果的 Apache HttpComponents 库。奇怪的是,Postman、我的浏览器和 curl 都返回一个一致且完整的主体,所以这必须是特定于 java 的。我正在使用 java 1.8.0_181。

有趣的是,当 curl 这个 url 时,在 4072 字节的内容中有一个网络数据包边界,如下所示:

<= Recv data, 4072 bytes (0xfe8)
0000: <?xml version="1.0" encoding="UTF-8"?>
...
...
0fb4:                <xsd:complexType>
0fd6:                   
<= Recv data, 2688 bytes (0xa80)
0000:  <xsd:sequence>
0011: ..             <xsd:element name="SubApplicationGroupID" type="x

我很清楚,这些其他客户端正在正确处理此请求,而内置的 java 原语并不总是提取到标头中声明的内容长度的末尾。以下是供您参考的标题:

=> Send header, 149 bytes (0x95)
0000: GET /apply/opportunities/schemas/applicant/PKG00037270/Project.x
0040: sd HTTP/1.1
004d: Host: trainingapply.grants.gov
006d: User-Agent: curl/7.54.0
0086: Accept: */*
0093: 
<= Recv header, 17 bytes (0x11)
0000: HTTP/1.1 200 OK
<= Recv header, 24 bytes (0x18)
0000: Connection: Keep-Alive
<= Recv header, 37 bytes (0x25)
0000: Date: Thu, 21 Feb 2019 20:50:04 GMT
<= Recv header, 18 bytes (0x12)
0000: Pragma: no-cache
<= Recv header, 22 bytes (0x16)
0000: Content-Length: 6760
<= Recv header, 39 bytes (0x27)
0000: Content-Type: text/xml; charset=UTF-8
<= Recv header, 40 bytes (0x28)
0000: Expires: Thu, 01 Jan 1970 00:00:00 GMT
<= Recv header, 33 bytes (0x21)
0000: X-XSS-Protection: 1; mode=block
<= Recv header, 21 bytes (0x15)
0000: X-ORACLE-DMS-RID: 0
<= Recv header, 33 bytes (0x21)
0000: X-Content-Type-Options: nosniff
<= Recv header, 114 bytes (0x72)
0000: Set-Cookie: JSESSIONID=M6AR0orf2aDdcnmBu-LHOBRL4pjuVTRhx_n-uj-2a
0040: p7OSBVfCuVG!1960121785; path=/; secure; HttpOnly
<= Recv header, 66 bytes (0x42)
0000: X-ORACLE-DMS-ECID: 7241b3bd-75bf-48a0-85e6-57059f2a08da-007a0021
<= Recv header, 35 bytes (0x23)
0000: X-Powered-By: Servlet/3.1 JSP/2.3
<= Recv header, 29 bytes (0x1d)
0000: X-Frame-Options: SAMEORIGIN
<= Recv header, 45 bytes (0x2d)
0000: Strict-Transport-Security: max-age=31536000
<= Recv header, 2 bytes (0x2)
0000: 

有什么想法可能会在这里发生吗?

标签: javaurlcurl

解决方案


事实证明,我的应用程序正在快速连续地多次调用这个通用服务,有时多次调用同一个 url。在那种情况下,服务器似乎正在关闭连接,因此响应被截断。我在获取这些 url 时放置了缓存,问题似乎已经消失,因此与库或我如何使用它们无关,但网络连接正在关闭而没有任何反馈或警告。


推荐阅读