java - 读取网址出错
问题描述
我想https://www.instagram.com/mobonews/?__a=1
使用 Java 语言阅读。以下 URL 的源代码等于:
{"logging_page_id":"profilePage_1410389643","show_suggested_profiles":false,"show_follow_dialog":false,"graphql":{"user":{"biography":"\u200f\u200e\u0645\u0627\u062c\u0631\u0627\u062c\u0648\u06cc\u06cc\u200c\u0647\u0627\u06cc \u0645\u0646 \u062f\u0631 \u062
但是下面的代码会返回这个:
<!DOCTYPE html><html lang="en" class="no-js not-logged-in client-root"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <title>Login • Instagram</title> <meta name="robots" content="noimageindex, noarchive"> <meta name="apple-mobile-web-app-status-bar-style" content="default"> <meta name="mobile-web-app-capable" content="yes"> <meta name="theme-color" content="#ffffff"> <meta id="viewport" name="viewport"
这是我使用的代码:
URL website = new URL(url);
URLConnection connection = website.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(
connection.getInputStream()));
StringBuilder response = new StringBuilder();
String inputLine;
while ((inputLine = in.readLine()) != null)
response.append(inputLine);
编辑:
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;
import org.apache.commons.io.IOUtils;
public class TestReadurlInsta {
/**
* @param args the command line arguments
*/
public static void main(String[] args) throws IOException {
URL u = new URL("https://www.instagram.com/mobonews/?__a=1");
URLConnection con = u.openConnection();
InputStream in = con.getInputStream();
String encoding = con.getContentEncoding();
encoding = encoding == null ? "UTF-8" : encoding;
String body = IOUtils.toString(in, encoding);
System.out.println(body);
}
编辑2:
似乎不知什么原因,我得到了 instagram 的登录页面:
<!DOCTYPE html> <html lang="en" class="no-js not-logged-in client-root"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <title> Login • Instagram </title>
过去,我从同一台机器上运行相同的代码,一切都很好,但突然间它让我陷入了这个问题。
编辑 3:
我从在线 IDE 运行相同的代码并收到以下异常。似乎拒绝建立连接,正如@Holger 所说,Instagram 可能会阻止访问该资源:
Exception in thread "main" java.net.UnknownHostException: www.instagram.com
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:673)
at sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:173)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1546)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254)
at java.net.URL.openStream(URL.java:1045)
at HelloWorld.main(HelloWorld.java:14)
但是有什么解决办法吗?
解决方案
我将 url 作为字符串传递给函数并使用了 bufferedReader。似乎您只是打印页面的正文,因此您没有获得来源。
这是我使用的代码
public class Solution {
public static void main(String[] args) throws IOException {
String str = getContent("https://www.instagram.com/mobonews/?__a=1");
System.out.println(str);
}
public static String getContent(String url) throws IOException {
URL website = new URL(url);
URLConnection connection = website.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(
connection.getInputStream()));
StringBuilder response = new StringBuilder();
String inputLine = "";
inputLine = in.readLine();
while(inputLine != null) {
response.append(inputLine);
inputLine = in.readLine();
}
return response.toString();
}
}