首页 > 解决方案 > Java - 为什么将空格作为一个单词读入?

问题描述

我正在尝试读取 .txt 文件并打印出最常用的前二十个单词和最不常用的前二十个单词。但是,当不应将空格视为一个单词时,它会显示为一个单词。正如我在输出中显示的那样,空白出现在第二个最常见的单词中,而它不应该出现。这是我的代码。

public class WordFreqCount {

    public static void main(String[] args) throws IOException {


        HashMap<String, Integer> frequencyMap = new HashMap<String, Integer>();

        FileReader bookFile = new FileReader("book.txt");
        Scanner s = new Scanner(new FileReader("book.txt"));


        while(s.hasNext()) {
            String line = s.nextLine();
            line.trim();
            String[] words = line.split("\\W+");
            for (int i = 0; i < words.length; i ++) {

                if (frequencyMap.containsKey(words[i])) {
                    frequencyMap.replace(words[i], frequencyMap.get(words[i]) + 1);
                }
                else {
                    frequencyMap.put(words[i], 1);
                }
            }
        }
        s.close();

        List<Entry<String,Integer>> list = sortByValue(frequencyMap);


        System.out.println("Top 20 Most Appeared Words:");

        int counter1 = 1;
        List<Map.Entry<String, Integer>> topTwenty = list.subList(0, 20);
        for(Map.Entry<String, Integer> word : topTwenty) {
            System.out.println("(" + counter1 + "): " + word.getKey() + " --> " + word.getValue());
            counter1 += 1;
        }

        System.out.println();
        System.out.println("Top 20 Least Appeared Words:");

        int counter2 = 1;
        Collections.reverse(list);
        List<Map.Entry<String, Integer>> bottomTwenty = list.subList(0, 20);
        for(Map.Entry<String, Integer> word : bottomTwenty) {
            System.out.println("(" + counter2 + "): " + word.getKey() + " --> " + word.getValue());
            counter2 += 1;
        }
    }
}

我的代码的输出是

(1): the --> 5426
(2):  --> 4986
(3): I --> 3038
(4): and --> 2887
(5): to --> 2788
(6): of --> 2733
(7): a --> 2595
(8): in --> 1747
(9): that --> 1664
(10): was --> 1393
(11): it --> 1303
(12): you --> 1283
(13): he --> 1168
(14): is --> 1131
(15): his --> 1103
(16): have --> 908
(17): my --> 907
(18): with --> 849
(19): had --> 821
(20): as --> 780

Top 20 Least Appeared Words:
(1): rival --> 1
(2): category --> 1
(3): arguments --> 1
(4): Bought --> 1
(5): billycock --> 1
(6): incoherent --> 1
(7): hail --> 1
(8): idle --> 1
(9): illustrious --> 1
(10): terminated --> 1
(11): Apaches --> 1
(12): topped --> 1
(13): laudanum --> 1
(14): filthy --> 1
(15): drama --> 1
(16): tune --> 1
(17): geology --> 1
(18): Mademoiselle --> 1
(19): balls --> 1
(20): Atkinson --> 1

我的理解是使用 line.split("\W+") 会解析句子以确保没有像空格、逗号等被计入单词或单词内。我在这里错过了什么吗?

book.txt 文件的一小部分

==================================================== ==========================

almost no restrictions whatsoever.  You may copy it, give it away or
re-use it under the terms of the Project Gutenberg License included
with this eBook or online at www.gutenberg.net


Title: The Adventures of Sherlock Holmes

Author: Arthur Conan Doyle

Posting Date: April 18, 2011 [EBook #1661]
First Posted: November 29, 2002

Language: English


*** START OF THIS PROJECT GUTENBERG EBOOK THE ADVENTURES OF SHERLOCK HOLMES ***




Produced by an anonymous Project Gutenberg volunteer and Jose Menendez









THE ADVENTURES OF SHERLOCK HOLMES

by

SIR ARTHUR CONAN DOYLE



   I. A Scandal in Bohemia
  II. The Red-headed League
 III. A Case of Identity
  IV. The Boscombe Valley Mystery
   V. The Five Orange Pips
  VI. The Man with the Twisted Lip
 VII. The Adventure of the Blue Carbuncle
VIII. The Adventure of the Speckled Band
  IX. The Adventure of the Engineer's Thumb
   X. The Adventure of the Noble Bachelor
  XI. The Adventure of the Beryl Coronet
 XII. The Adventure of the Copper Beeches




ADVENTURE I. A SCANDAL IN BOHEMIA

I.

To Sherlock Holmes she is always THE woman. I have seldom heard
him mention her under any other name. In his eyes she eclipses
and predominates the whole of her sex. It was not that he felt
any emotion akin to love for Irene Adler. All emotions, and that
one particularly, were abhorrent to his cold, precise but
admirably balanced mind. He was, I take it, the most perfect
reasoning and observing machine that the world has seen, but as a
lover he would have placed himself in a false position. He never
spoke of the softer passions, save with a gibe and a sneer. They
were admirable things for the observer--excellent for drawing the
veil from men's motives and actions. But for the trained reasoner
to admit such intrusions into his own delicate and finely
adjusted temperament was to introduce a distracting factor which
might throw a doubt upon all his mental results. Grit in a
sensitive instrument, or a crack in one of his own high-power
lenses, would not be more disturbing than a strong emotion in a
nature such as his. And yet there was but one woman to him, and
that woman was the late Irene Adler, of dubious and questionable
memory.

I had seen little of Holmes lately. My marriage had drifted us
away from each other. My own complete happiness, and the
home-centred interests which rise up around the man who first
finds himself master of his own establishment, were sufficient to
absorb all my attention, while Holmes, who loathed every form of
society with his whole Bohemian soul, remained in our lodgings in
Baker Street, buried among his old books, and alternating from
week to week between cocaine and ambition, the drowsiness of the
drug, and the fierce energy of his own keen nature. He was still,
as ever, deeply attracted by the study of crime, and occupied his
immense faculties and extraordinary powers of observation in
following out those clues, and clearing up those mysteries which
had been abandoned as hopeless by the official police. From time
to time I heard some vague account of his doings: of his summons
to Odessa in the case of the Trepoff murder, of his clearing up
of the singular tragedy of the Atkinson brothers at Trincomalee,
and finally of the mission which he had accomplished so
delicately and successfully for the reigning family of Holland.
Beyond these signs of his activity, however, which I merely
shared with all the readers of the daily press, I knew little of
my former friend and companion.```

标签: java

解决方案


我想这里的重点是您trim实际上并没有这样做。这种建设

line.trim();

实际上并没有修剪一个字符串,它返回一个新的修剪过的字符串。字符串在java中是不可变的。你应该做这样的事情

String line = s.nextLine().trim();

为了使它工作。


推荐阅读