首页 > 解决方案 > 如何从 Jsoup 中的“a”标签中获取属性“href”?

问题描述

我正在制作一个网页抓取项目,基本上是从 Google 图片中获取图片。为了获取图像 src,我使用

Element.attr("href")

但是,它返回

#

我的代码

Document shivWall = Jsoup.connect(searchURL).get();
Elements smallImgElements = shivWall.getElementsByClass("rg_bx rg_di rg_el ivg-i");
smallImgElements.get(0).select("a.rg_l").get(0).attr("href");

我尝试了很多方法,但没有一个奏效。我什至通过将 attr 参数更改为某个随机值来再次检查,它按预期返回 null。但是,对于“href”,它只返回一个“#”。请帮忙。

标签: javaandroidweb-scrapingjsoup

解决方案


为了从网站获取标签的 src 属性,您可以尝试使用以下代码段

public class DownloadImages {

    //The url of the website. This is just an example
    private static final String webSiteURL = "http://www.supercars.net/gallery/119513/2841/5.html";

   //The path of the folder that you want to save the images to
   private static final String folderPath = "<FOLDER PATH>";

   public static void main(String[] args) {

   try {
     //Connect to the website and get the html32
     Document doc = Jsoup.connect(webSiteURL).get();

     //Get all elements with img tag ,
     Elements img = doc.getElementsByTag("img");

     for (Element el : img) {
         //for each element get the srs url
         String src = el.absUrl("src");

         System.out.println("Image Found!");

         System.out.println("src attribute is : "+src);

         getImages(src);

   }


} catch (IOException ex) {
   System.err.println("There was an error");
   Logger.getLogger(DownloadImages.class.getName()).log(Level.SEVERE, null, ex);

     }

}

private static void getImages(String src) throws IOException {
     String folder = null;

     //Exctract the name of the image from the src attribute
     int indexname = src.lastIndexOf("/");

     if (indexname == src.length()) {
         src = src.substring(1, indexname);

        }

        indexname = src.lastIndexOf("/");

        String name = src.substring(indexname, src.length());

        System.out.println(name);

        //Open a URL Stream

        URL url = new URL(src);

        InputStream in = url.openStream();

        OutputStream out = new BufferedOutputStream(new FileOutputStream( folderPath+ name));

        for (int b; (b = in.read()) != -1;) {

            out.write(b);

        }

        out.close();

        in.close();

    }

}

这是您可以使用 Jsoup 从网站下载图像的方法。

希望这可以帮助。


推荐阅读