首页 > 解决方案 > java.lang.arrayindexoutofboundsexception jsoup

问题描述

我正在尝试从网站中提取所有图像并使用 AWS 图像识别 API 分析每个图像。它适用于某些网站,但是某些网站返回错误消息:“500 server error java.lang.arrayindexoutofboundsexception index:281 size 281”。

基本上我正在使用jsoup然后创建一个对象来存储每个图像的名称和图像 URL 来抓取图像。之后,我调用 API 并检查ArrayList. 出于某种原因,它仅适用于某些网站。

有人可以解释我做错了什么以及如何防止这个错误吗?

@WebServlet(name = "HelloAppEngine", urlPatterns = {
    "/hello"
})
public class HelloAppEngine extends HttpServlet {

    static ArrayList < ResponseData > testImages = new ArrayList < > ();
    static AmazonRekognition rekognitionClient = AmazonRekognitionClientBuilder.defaultClient();

    public static void getimages() throws MalformedURLException, IOException {

        System.out.println("getImages called" + testImages);
        int index = 0;
        for (ResponseData data: testImages) {

            System.err.println("open stream for:" + data.getUrl());
            ByteBuffer imageBytes = null;
            try (InputStream inputStream = new URL(data.getUrl()).openStream()) {
                System.out.println(inputStream);
                imageBytes = ByteBuffer.wrap(IOUtils.toByteArray(inputStream));

                System.out.println(imageBytes);

            } catch (IOException e1) {
                System.err.println(e1.getMessage());
            }


            //
            DetectLabelsRequest request = new DetectLabelsRequest().withImage(new Image().withBytes(imageBytes)); //.withMaxLabels(10).withMinConfidence(77F);


            try {

                DetectLabelsResult result = rekognitionClient.detectLabels(request);
                List < Label > labels = result.getLabels();
                //System.out.println(labels);
                //System.out.println("Detected labels for " + photo+""+labels);
                for (Label label: labels) {
                    //loop through all labels of object 
                    //create new responsedata object for each image
                       //where im getting error  
                     if (testImages.get(index) != null) {
                    ResponseData d = testImages.get(index);
                    d.setName(label.getName());
                    testImages.set(index, d);
                    //increment for making new image url and name
                    index++;


                    System.out.println(label.getName() + ": " + label.getConfidence().toString());
                }
                }
                //
            } catch (AmazonRekognitionException e) {
                System.err.println(e.getMessage());
            }

        }
    }

    private static final long serialVersionUID = 1 L;

    protected static final Gson GSON = new GsonBuilder().create();

    // This is just a test array

    ArrayList < String > list = new ArrayList < String > ();

    @Override

    protected final void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {

        resp.setContentType("text/json");
        String servlet = req.getServletPath();
        System.setProperty("http.proxyHost", "192.168.5.1");
        System.setProperty("http.proxyPort", "1080");
        log("servlet:" + servlet);
        if (servlet.equalsIgnoreCase("/main")) {
            log("if body start");

            String urlString = java.net.URLDecoder.decode(req.getParameter("url"), "UTF-8");

            // Connect to website. This can be replaced with your file loading
            // implementation
            Document doc = Jsoup.connect(urlString).get();

            // Get all img tags
            Elements img = doc.getElementsByTag("img");
            Elements media = doc.select("[src]");
            int counter = 0;

            // Loop through img tags
            for (Element src: media) {
                if (src.tagName().equals("img")) {
                    counter++;
                       //create reposnsedata object for each image url
                    ResponseData data = new ResponseData();
                      //set object url to image url
                    data.setUrl(src.attr("abs:src"));
                     //set data name from aws 
                    data.setName(" ");
                    testImages.add(data);
                    // getimages();
                }
                if (src.tagName().equals("link[href~=.*\\.(ico|png)]")) {
                    System.out.println("image is logo");
                }
                if (src.tagName().equals("meta[itemprop=image]")) {
                    System.out.println("image is logosss");
                }

            }
        }
        //log("list" + testImages);
        getimages();
        //

        // getimages();
        System.err.println(GSON.toJson(testImages));
        resp.getWriter().println(GSON.toJson(testImages));
    }

    @Override
    protected final void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
        doPost(req, resp);
    }
}

标签: javaamazon-web-services

解决方案


您正在尝试从中获取第 282 张图像(索引 = 281),testImages但只有 281 个(索引 = 280)。您正在为每个标签获取每个图像,并且标签可能比图像多。尝试显示它们的数量:

System.out.println("testImages.size() is: " + testImages.size());
System.out.println("labels.size() is: " + labels.size());

为避免获得比标签更多的图像,请尝试替换此条件:

if (testImages.get(index) != null) {

if (index < testImages.size() && testImages.get(index) != null) {

推荐阅读