首页 > 解决方案 > C# - 将 2 个列表与自定义元素进行比较

问题描述

我有 2 个清单。一个包含搜索元素,一个包含数据。我需要循环 list2 中的每个元素,其中包含 list1 中的任何字符串(“cat”或“dog”)。举些例子:

List<string> list1 = new List<string>();
list1.Add("Cat");
list1.Add("Dog");
list1.Add... ~1000 items;

List<string> list2 = new List<string>();
list2.Add("Gray Cat");
list2.Add("Black Cat");
list2.Add("Green Duck");
list2.Add("White Horse");
list2.Add("Yellow Dog Tasmania");
list2.Add("White Horse");
list2.Add... ~million items;

我的期望是listResult: {"Gray Cat", "Black Cat", "Yellow Dog Tasmania"}(因为它在list1中包含“猫”和“狗”)。除了嵌套循环,你有什么想法可以让序列运行得更快吗?

我目前的解决方案如下。但是......它似乎太慢了:

foreach (string str1 in list1)
{
   foreach (string str2 in list2)
   {
      if str2.Contains(str1)
      {
         listResult.Add(str2);
      }
   }
}

标签: c#listfor-loopiterator

解决方案


An excellent use case for parallelization!

Linq approach without parallelization (equals internally your approach beside the fact that the internal loop breaks if one match was found - your approach also searches for other matches)

List<string> listResult = list2.Where(x => list1.Any(x.Contains)).ToList();

Parallelized the loop with AsParallel() - if you have a multicore system there will be a huge performance improvement.

List<string> listResult = list2.AsParallel().Where(x => list1.Any(x.Contains)).ToList();

Runtime comparison: (4 core system, list1 1000 items, list2 1.000.000 items)

Without AsParallel(): 91 seconds
With    AsParallel(): 23 seconds

The other way with Parallel.ForEach and a thread safe result list

System.Collections.Concurrent.ConcurrentBag<string> listResult = new System.Collections.Concurrent.ConcurrentBag<string>();
System.Threading.Tasks.Parallel.ForEach<string>(list2, str2 =>
{
    foreach (string str1 in list1)
    {
        if (str2.Contains(str1))
        {
            listResult.Add(str2);
            //break the loop if one match was found to avoid duplicates and improve performance
            break;
        }
    }
});

Side note: You have to iterate over list2 first and break; after match, otherwise you add items twice: https://dotnetfiddle.net/VxoRUW


推荐阅读