首页 > 解决方案 > 无法计数在java中使用pdfbox的广告字符

问题描述

class ReadPDF {


    public void Read() throws IOException {

        int amountOfWords = 0;
        int amountOfChars = 0;
        String sourceCode ="";

        try {
            PDDocument doc = PDDocument.load(new File("C:\\Users\\ccw\\Desktop\\articles\\RECYCLING-BEHAVIOUR-AMONG-MALAYSIAN-TERTIARY-STUDENTS.pdf"));
            String text = new PDFTextStripper().getText(doc);

            sourceCode = sourceCode.replace ("-", "").replace (".", "");

            while(doc!=null){
                String[] words = sourceCode.split(" ");
                amountOfWords = amountOfWords + words.length;
                for (String word : words) {
                    amountOfChars = amountOfChars + word.length();
                }
            }

            System.out.println("Amount of Chars is " + amountOfChars);
            System.out.println("Amount of Words is " + (amountOfWords + 1));
            System.out.println("Average Word Length is "+ (amountOfChars/amountOfWords));


        }catch (IOException e) {
            System.out.println(e);
        }

    }

}

我正在尝试使用 pdfbox 计算 pdf 文件中的所有单词和字符。但现在我得到一个错误,sourceCode 没有初始化

标签: javapdfbox

解决方案


将此行替换sourceCode = sourceCode.replace ("-", "").replace (".", "");为 . 并sourceCode = text.replace ("-", "").replace (".", "");删除 while 循环


推荐阅读