首页 > 解决方案 > 如何使用 Jsoup 文档方法

问题描述

我是 Java 世界的新手和初学者。我有这个代码

public class Test2 {

    public static void main(String[] args) throws IOException {

    try {

        String url          = "http://www.metalbulletin.com/Login.html?ReturnURL=%2fdefault.aspx&";
            String articleURL   = "https://www.metalbulletin.com/Article/3838710/Home/CHINA-REBAR-Domestic-prices-recover-after-trading-pick-up.html";

            Connection.Response loginForm = Jsoup.connect(url)
            .method(Connection.Method.GET)
            .execute();

            Document welcomePage    = loginForm.parse();                
            Element formElement     = welcomePage.body().getElementsByTag("form").get(0);
            String formAction       = formElement.attr("action");

            Elements input = welcomePage.select("input[name=idsrv.xsrf]");
            String securityTokenValue =input.attr("value");         

            Connection.Response mainPage = Jsoup.connect("https://account.metalbulletin.com"+formAction)
            .data("idsrv.xsrf", securityTokenValue)
            .data("username", "ifiih@rupayamail.com")
            .data("password", "Kh457544")
            .cookies(loginForm.cookies())
            .method(Connection.Method.POST)
            .execute();

            Map<String, String> cookies = mainPage.cookies();

            System.out.println("\n\nloginForm.cookies()==>\n"+loginForm.cookies());
            System.out.println("\n\nmainPage.cookies()==>\n"+mainPage.cookies());

                Document articlePage    = Jsoup.connect(articleURL).cookies(cookies).get();
                Element article         = articlePage.getElementById("article-body");   
                Elements lead1          = article.getElementsByClass("articleContainer");       
                System.out.println("\n\nNews Article==>\n"+lead1);
    } catch (IOException e) {
        e.printStackTrace();
    }

    }

}

我该如何重构:

private Map<String, String> cookies = new HashMap<String, String>();

            private Document get(String url) throws IOException {
                Connection connection = Jsoup.connect(url);
                for (Map.Entry<String, String> cookie : cookies.entrySet()) {
                connection.cookie(cookie.getKey(), cookie.getValue());
                }
                Response response = connection.execute();
                cookies.putAll(response.cookies());
                return response.parse();
            }

我不确定如何调用此private Document get(String url)方法。这似乎是一个愚蠢的问题,但对我来说非常重要。

我怎么能在同一个班级里称呼它?

标签: cookiesjsoupsession-cookiesjava

解决方案


为此,检索文档和 Cookie 映射的最简单和更有效的解决方案是创建一个名为 TestThreadHandler 的新类,如下所示:

public class TestThreadHandler implements Runnable {

    private String url;
    private Document doc;
    private Map<String, String> cookies;
    private Semaphore barrier;

    public TestThreadHandler (String url, Document doc, Map<String, String> cookies, Semaphore barrier) {
        this.url = url;
        this.doc = doc;
        this.cookies = cookies;
        this.barrier = barrier;
    }

    public void run () {
        try {
            Connection connection = Jsoup.connect(this.url);

            for (Map.Entry<String, String> cookie : this.cookies.entrySet()) {
                connection.cookie(cookie.getKey(), cookie.getValue());
            }
            Response response = connection.execute();

            this.cookies.putAll(response.cookies());

            this.doc = response.parse();

        } catch (IOException e) {
            e.printStackTrace();
        }

        this.barrier.release();
    }

}

并从您想要调用它的任何地方从您的 Test2 类中调用该线程,但对该线程的示例调用将是:

public class Test2 {

    public static void main(String[] args) throws IOException {

        try {

            ...

            String url = "https://www.google.com";
            Document doc;
            Map<String, String> cookies = new HashMap<String, String>();
            Semaphore barrier = new Semaphore(0);

            Thread taskThread = new Thread( new TestThreadHandler(url, doc, cookies, barrier) );
            taskThread.start();

            barrier.acquireUninterruptibly(1); // Wait until Thread ends

            // NOW YOU HAVE BOTH DOC AND COOKIES FILLED AS DESCRIBED IN TestThreadHandler

            ...

        } catch (IOException e) {
            e.printStackTrace();
        }

    }

}

这样做可以覆盖作为参数传递给 Thread 的变量,并获取 Cookie 和 JSOUP 文档。

如需进一步解释,请查看 ThreadHandling 的 Java 文档或随时问我!

希望这对您有所帮助!+1


推荐阅读