首页 > 解决方案 > 找到常用词及其频率

问题描述

我试图从两个文本文件中找到常用词的频率。到目前为止,我已经看到了下面的代码。我为单词和他的计数创建了对象类Word,但我很难找到字符串的常见单词频率。问题出在TaskUtils课堂上,但是我无法完成任务,因为我根本不知道怎么做。任何帮助将不胜感激。

TaskUtils 类:

public static List<Word> CommonWords(List<string> file1, List<string> file2)
{
    List<Word> allCommonWords = new List<Word>();

    var first = file1.GroupBy(x => x).Where(x => x.Count() == 1).SelectMany(x => x);

    int singleWordCount = 0;

    foreach (var word in first)
    {
        if (file2.Contains(word) && allCommonWords.Count < 10)
        {
            singleWordCount++;
            allCommonWords.Add(new Word(word, singleWordCount));
        }
    }
    allCommonWords = allCommonWords.OrderByDescending(x => x.wordCount).ThenBy(x => x.word).ToList<Word>();
    return allCommonWords;
}

词类:

class Word
{
    public string word { get; set; }
    public int wordCount { get; set; }

    public Word(string word, int wordCount)
    {
        this.word = word;
        this.wordCount = wordCount;
    }
}

主要的:

static void Main(string[] args)
{
    string data1 = "Book1.txt";
    string data2 = "Book2.txt";
    string result = "Result.txt";

    File.Delete(result);

    List<string> file1 = InOut.Read(data1);
    List<string> file2 = InOut.Read(data2);

    List<string> uniqueWords = TaskUtils.UniqueWords(file1, file2);
    InOut.PrintUniqueWords(result, uniqueWords);

    List<Word> commonWords = TaskUtils.CommonWords(file1, file2);
    InOut.PrintCommonWords(result, commonWords);
}

常用词结果:

Common word count: 7
--------------------------------------
| Nr|          Word|      Frequency|
--------------------------------------
|  1|      prevailed|             7|
|  2|         broken|             6|
|  3|           sort|             5|
|  4|        victory|             4|
|  5|            had|             3|
|  6|            she|             2|
|  7|            but|             1|
--------------------------------------

标签: c#

解决方案


尝试使用字典并简单地增加 Word 类的 Count 属性

public sealed class Word
{
    public string Value { get; private set; }
    private int _count;
    public int Count => _count;

    public Word(string word)
    {
        this.Value = word;
    }

    public int Increment()
    {
        return Interlocked.Increment(ref _count);
    }
}

public static List<Word> CommonWords(List<string> file1, List<string> file2)
{
    var size = file1.Count > file2.Count ? file1.Count : file2.Count;
    var dict = new Dictionary<string, Word>(size);

    FillDict(file1, dict);
    FillDict(file2, dict);


    return dict.Values.OrderByDescending(x => x.Count).ToList();
}

private static void FillDict(List<string> list, Dictionary<string, Word> dict)
{
    //or ConcurrentDictionary<>
    var sync = new object();
    //Parallel.ForEach(list, item =>
    foreach (var item in list)
    {
        Word word = null;
        //lock (sync)
        {
            if (!dict.ContainsKey(item))
            {
                word = new Word(item);
                dict[item] = word;
            }
            else
            {
                word = dict[item];
            }
        }
        word?.Increment();
    }//);
}

推荐阅读