首页 > 解决方案 > HtmlUnit 在循环中获取表格,但不是第二次通过

问题描述

我正在用 HtmlUnit 解析一个网页。这个网页有一堆我以编程方式设置的输入,然后单击提交按钮。这将在输入下方的同一页面上返回分析结果。

解析器在第一次通过循环时工作正常,但不是第二次。这是代码:

public void getPortfolioVisualizerData(List<String>symbols) throws Exception {
        final WebClient webClient = new WebClient();
        final HtmlPage page = webClient.getPage("https://www.portfoliovisualizer.com/backtest-portfolio#analysisResults");
        HtmlForm form = page.getFirstByXPath("//form[@action='backtest-portfolio#analysisResults']");

        //Time Period combobox
        HtmlSelect select = (HtmlSelect) page.getElementById("timePeriod");
        HtmlOption option = select.getOptionByValue("4");   
        select.setSelectedAttribute(option, true);

        //Start Year combobox
        select = (HtmlSelect) page.getElementById("startYear");
        option = select.getOptionByValue("1985");  
        select.setSelectedAttribute(option, true);

        //End Year combobox
        select = (HtmlSelect) page.getElementById("endYear");
        option = select.getOptionByValue("2018");  
        select.setSelectedAttribute(option, true);

        //Initial Amount text input
        HtmlTextInput textField = form.getInputByName("initialAmount");
        textField.type("10000");

        //Periodic Adjustment combobox
        select = (HtmlSelect) page.getElementById("annualOperation");
        option = select.getOptionByValue("0");  
        select.setSelectedAttribute(option, true);

        //Rebalancing combobox
        select = (HtmlSelect) page.getElementById("rebalanceType");
        option = select.getOptionByValue("1");  
        select.setSelectedAttribute(option, true);

        //Display Income combobox
        select = (HtmlSelect) page.getElementById("showYield");
        option = select.getOptionByValue("false");  
        select.setSelectedAttribute(option, true);

        //Benchmark combobox
        select = (HtmlSelect) page.getElementById("benchmark");
        option = select.getOptionByValue("VFINX");  
        select.setSelectedAttribute(option, true);

        //Allocation 1 text input
        textField = form.getInputByName("allocation1_1");
        textField.type("100");
        HtmlSubmitInput button = (HtmlSubmitInput)page.getElementById("submitButton");
        Data data = new Data();

        for (String symbol:symbols) {
            //Asset 1 text input
            textField = form.getInputByName("symbol1");
            textField.type(symbol);

            // Now submit the form by clicking the Analyze Portfolios button and get back the second page.
            HtmlPage page2 = button.click();
            HtmlTable table = (HtmlTable) page2.getByXPath("//table[@class='table table-striped table-condensed']").get(1);   //the second table on the page
            int rowNum = 0;
            for (HtmlTableRow row : table.getRows()) {
                rowNum++;
                if (rowNum==1) continue;    //skip table header values
                int colNum = 0;
                for (HtmlTableCell cell : row.getCells()) {
                    colNum++;
                    if (rowNum==2) {
                        data.Symbol = symbol;
                        String val = cell.asText();
                        switch(colNum) {
                            case 4:  data.CAGR               = val.replace("%", ""); break;
                            case 5:  data.StdDev             = val.replace("%", ""); break;
                            case 6:  data.BestYear           = val.replace("%", ""); break;
                            case 7:  data.WorstYear          = val.replace("%", ""); break;
                            case 8:  data.MaxDrawdown        = val.replace("%", ""); break;
                            case 9:  data.SharpRatio         = val;                  break;
                            case 10: data.SortinoRatio       = val;                  break;
                            case 11: data.CorrelationToUsMkt = val;
                        }
                    }

            }
            saveStock(data);
            button = (HtmlSubmitInput)page2.getElementById("submitButton");
            form = page2.getFirstByXPath("//form[@action='backtest-portfolio#analysisResults']");
       }
    }

它给了我一个 java.lang.IndexOutOfBoundsException: Index: 1, Size: 0 在这一行:

HtmlTable table = (HtmlTable) page2.getByXPath("//table[@class='table table-striped table-condensed']").get(1);   //the second table on the page

感兴趣的表格是页面上的第二个表格,但错误似乎表明它在第二次通过循环时找不到任何表格。为什么不?如果我手动输入第二个符号,它会返回感兴趣的表。

标签: javahtmlunit

解决方案


我认为您应该在从 XPath 获取表格之前和点击之后添加延迟。它可能会在加载第二页之前尝试。


推荐阅读