c# - 找到常用词及其频率
问题描述
我试图从两个文本文件中找到常用词的频率。到目前为止,我已经看到了下面的代码。我为单词和他的计数创建了对象类Word
,但我很难找到字符串的常见单词频率。问题出在TaskUtils
课堂上,但是我无法完成任务,因为我根本不知道怎么做。任何帮助将不胜感激。
TaskUtils 类:
public static List<Word> CommonWords(List<string> file1, List<string> file2)
{
List<Word> allCommonWords = new List<Word>();
var first = file1.GroupBy(x => x).Where(x => x.Count() == 1).SelectMany(x => x);
int singleWordCount = 0;
foreach (var word in first)
{
if (file2.Contains(word) && allCommonWords.Count < 10)
{
singleWordCount++;
allCommonWords.Add(new Word(word, singleWordCount));
}
}
allCommonWords = allCommonWords.OrderByDescending(x => x.wordCount).ThenBy(x => x.word).ToList<Word>();
return allCommonWords;
}
词类:
class Word
{
public string word { get; set; }
public int wordCount { get; set; }
public Word(string word, int wordCount)
{
this.word = word;
this.wordCount = wordCount;
}
}
主要的:
static void Main(string[] args)
{
string data1 = "Book1.txt";
string data2 = "Book2.txt";
string result = "Result.txt";
File.Delete(result);
List<string> file1 = InOut.Read(data1);
List<string> file2 = InOut.Read(data2);
List<string> uniqueWords = TaskUtils.UniqueWords(file1, file2);
InOut.PrintUniqueWords(result, uniqueWords);
List<Word> commonWords = TaskUtils.CommonWords(file1, file2);
InOut.PrintCommonWords(result, commonWords);
}
常用词结果:
Common word count: 7
--------------------------------------
| Nr| Word| Frequency|
--------------------------------------
| 1| prevailed| 7|
| 2| broken| 6|
| 3| sort| 5|
| 4| victory| 4|
| 5| had| 3|
| 6| she| 2|
| 7| but| 1|
--------------------------------------
解决方案
尝试使用字典并简单地增加 Word 类的 Count 属性
public sealed class Word
{
public string Value { get; private set; }
private int _count;
public int Count => _count;
public Word(string word)
{
this.Value = word;
}
public int Increment()
{
return Interlocked.Increment(ref _count);
}
}
public static List<Word> CommonWords(List<string> file1, List<string> file2)
{
var size = file1.Count > file2.Count ? file1.Count : file2.Count;
var dict = new Dictionary<string, Word>(size);
FillDict(file1, dict);
FillDict(file2, dict);
return dict.Values.OrderByDescending(x => x.Count).ToList();
}
private static void FillDict(List<string> list, Dictionary<string, Word> dict)
{
//or ConcurrentDictionary<>
var sync = new object();
//Parallel.ForEach(list, item =>
foreach (var item in list)
{
Word word = null;
//lock (sync)
{
if (!dict.ContainsKey(item))
{
word = new Word(item);
dict[item] = word;
}
else
{
word = dict[item];
}
}
word?.Increment();
}//);
}
推荐阅读
- android - 跟踪用户使用功能的次数的计数器 - android
- ajax - 如何使用 jquery 调用 ajax kendo mvc 函数
- salesforce - SalesForce SOQL 大小列中的最高数字
- ios - 在 Swift 中将图层样式组合到多个 UIButton
- javascript - 仅当它后跟一些字符串时才由分隔符分割
- android - recyclerview滚动和viewpager滑动之间的冲突
- java - 如何配置 ORMLite 以将 `byte[]` 保存为 DataType.BYTE_ARRAY?
- python - Flask 和 Google Calendar API 身份验证问题
- c# - 根据条件隐藏数据网格中的列
- javascript - javascript过滤器应用程序无法正常工作所有未处理的uls