首页 > 解决方案 > 读取网址出错

问题描述

我想https://www.instagram.com/mobonews/?__a=1使用 Java 语言阅读。以下 URL 的源代码等于:

{"logging_page_id":"profilePage_1410389643","show_suggested_profiles":false,"show_follow_dialog":false,"graphql":{"user":{"biography":"\u200f\u200e\u0645\u0627\u062c\u0631\u0627\u062c\u0648\u06cc\u06cc\u200c\u0647\u0627\u06cc \u0645\u0646 \u062f\u0631 \u062

但是下面的代码会返回这个:

<!DOCTYPE html><html lang="en" class="no-js not-logged-in client-root">    <head>        <meta charset="utf-8">        <meta http-equiv="X-UA-Compatible" content="IE=edge">        <title>Login • Instagram</title>                <meta name="robots" content="noimageindex, noarchive">        <meta name="apple-mobile-web-app-status-bar-style" content="default">        <meta name="mobile-web-app-capable" content="yes">        <meta name="theme-color" content="#ffffff">        <meta id="viewport" name="viewport"

这是我使用的代码:

    URL website = new URL(url);
    URLConnection connection = website.openConnection();
    BufferedReader in = new BufferedReader(new InputStreamReader(
                                connection.getInputStream()));

    StringBuilder response = new StringBuilder();
    String inputLine;

    while ((inputLine = in.readLine()) != null) 
        response.append(inputLine);

编辑:

import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;
import org.apache.commons.io.IOUtils;

public class TestReadurlInsta {

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) throws IOException {
        
  URL u = new URL("https://www.instagram.com/mobonews/?__a=1");

URLConnection con = u.openConnection();
InputStream in = con.getInputStream();
String encoding = con.getContentEncoding(); 
encoding = encoding == null ? "UTF-8" : encoding;
String body = IOUtils.toString(in, encoding);
System.out.println(body);
    
    }

编辑2:

似乎不知什么原因,我得到了 instagram 的登录页面:

<!DOCTYPE html> <html lang="en" class="no-js not-logged-in client-root">     <head>         <meta charset="utf-8">         <meta http-equiv="X-UA-Compatible" content="IE=edge">          <title> Login • Instagram </title>

过去,我从同一台机器上运行相同的代码,一切都很好,但突然间它让我陷入了这个问题。

编辑 3

我从在线 IDE 运行相同的代码并收到以下异常。似乎拒绝建立连接,正如@Holger 所说,Instagram 可能会阻止访问该资源:

Exception in thread "main" java.net.UnknownHostException: www.instagram.com
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:673)
at sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:173)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1546)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254)
at java.net.URL.openStream(URL.java:1045)
at HelloWorld.main(HelloWorld.java:14)

但是有什么解决办法吗?

标签: javajsonurlstreamhttpurlconnection

解决方案


我将 url 作为字符串传递给函数并使用了 bufferedReader。似乎您只是打印页面的正文,因此您没有获得来源。

这是我使用的代码

public class Solution {
    public static void main(String[] args) throws IOException {
        String str = getContent("https://www.instagram.com/mobonews/?__a=1");
        System.out.println(str);
    }
    public static String getContent(String url) throws IOException {
        URL website = new URL(url);
        URLConnection connection = website.openConnection();
        BufferedReader in = new BufferedReader(new InputStreamReader(
                connection.getInputStream()));

        StringBuilder response = new StringBuilder();
        String inputLine = "";
        
            inputLine = in.readLine();

        while(inputLine != null) {
            response.append(inputLine);
            inputLine = in.readLine();
        }
        return response.toString();
    }
}

推荐阅读