首页 > 解决方案 > Linq 用于获取句子中的单词

问题描述

我有一个单词列表和一个句子列表。我想知道哪些可以在哪些句子中找到。

这是我的代码:

List<string> sentences = new List<string>();
List<string> words = new List<string>();

sentences.Add("Gallia est omnis divisa in partes tres, quarum unam incolunt Belgae, aliam Aquitani, tertiam qui ipsorum lingua Celtae, nostra Galli appellantur.");
sentences.Add("Alea iacta est.");
sentences.Add("Libenter homines id, quod volunt, credunt.");

words.Add("est");
words.Add("homines");

List<string> myResults = sentences
  .Where(sentence => words
     .Any(word => sentence.Contains(word)))
  .ToList();

我需要的是一个元组列表。随着句子和单词,在句子中找到了。

标签: c#stringlinq

解决方案


首先,我们必须定义什么是 word。让它是字母和撇号的任意组合

  Regex regex = new Regex(@"[\p{L}']+");

其次,我们应该考虑如何处理case。让我们实现不区分大小写的例程:

  HashSet<string> wordsToFind = new HashSet<string>(StringComparer.OrdinalIgnoreCase) {
    "est",
    "homines"
  };

然后我们可以使用Regex来匹配句子中的单词,并使用Linq来查询句子:

代码:

  var actualWords = sentences
    .Select((text, index) => new {
      text = text,
      index = index,
      words = regex
        .Matches(text)
        .Cast<Match>()
        .Select(match => match.Value)
        .ToArray()
    })
    .SelectMany(item => item.words
       .Where(word => wordsToFind.Contains(word))
       .Select(word => Tuple.Create(word, item.index + 1)));

  string report = string.Join(Environment.NewLine, actualWords);

  Console.Write(report);

结果:

  (est, 1)         // est appears in the 1st sentence
  (est, 2)         // est appears in the 2nd sentence as well
  (homines, 3)     // homines appears in the 3d sentence

如果你想要单词句子,只需在最后更改Tuple<string, string>Tuple.Create(word, item.index + 1)Tuple.Create(word, item.text)Select


推荐阅读