首页 > 解决方案 > 为什么 1000 个线程比几个线程快?

问题描述

我有一个简单的程序,可以在二维点数组中进行线性搜索。我对 1 000 000 个点的数组进行 1000 次搜索。

奇怪的是,如果我产生 1000 个线程,程序的运行速度与我跨度仅与我拥有的 CPU 内核一样多,或者当我使用 Parallel.For 时一样快。这与我所知道的关于创建线程的一切背道而驰。创建和销毁线程是昂贵的,但显然不是在这种情况下。

有人可以解释为什么吗?

注意:这是一个方法论的例子;搜索算法故意不是为了优化。重点是线程。

注 2:我在 4 核 i7 和 3 核 AMD 上进行了测试,结果遵循相同的模式!

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading;

/// <summary>
/// We search for closest points.
/// For every point in array searchData, we search into inputData for the closest point, 
/// and store it at the same position into array resultData;
/// </summary>
class Program
{
    class Point
    {
        public double X { get; set; }
        public double Y { get; set; }

        public double GetDistanceFrom (Point p)
        {
            double dx, dy;
            dx = p.X - X;
            dy = p.Y - Y;
            return Math.Sqrt(dx * dx + dy * dy);
        }
    }

    const int inputDataSize = 1_000_000;
    static Point[] inputData = new Point[inputDataSize];

    const int searchDataSize = 1000;
    static Point[] searchData = new Point[searchDataSize];
    static Point[] resultData = new Point[searchDataSize];

    static void GenerateRandomData (Point[] array)
    {
        Random rand = new Random();
        for (int i = 0; i < array.Length; i++)
        {
            array[i] = new Point()
            {
                X = rand.NextDouble() * 100_000,
                Y = rand.NextDouble() * 100_000
            };
        }
    }

    private static void SearchOne(int i)
    {
        var searchPoint = searchData[i];
        foreach (var p in inputData)
        {
            if (resultData[i] == null)
            {
                resultData[i] = p;
            }
            else
            {
                double oldDistance = searchPoint.GetDistanceFrom(resultData[i]);
                double newDistance = searchPoint.GetDistanceFrom(p);
                if (newDistance < oldDistance)
                {
                    resultData[i] = p;
                }
            }
        }
    }

    static void AllThreadSearch()
    {
        List<Thread> threads = new List<Thread>();
        for (int i = 0; i < searchDataSize; i++)
        {
            var thread = new Thread(
                obj =>
                {
                    int index = (int)obj;
                    SearchOne(index);
                });
            thread.Start(i);
            threads.Add(thread);
        }
        foreach (var t in threads) t.Join();
    }

    static void FewThreadSearch()
    {
        int threadCount = Environment.ProcessorCount;
        int workSize = searchDataSize / threadCount;
        List<Thread> threads = new List<Thread>();
        for (int i = 0; i < threadCount; i++)
        {
            var thread = new Thread(
                obj =>
                {
                    int[] range = (int[])obj;
                    int from = range[0];
                    int to = range[1];
                    for (int index = from; index < to; index++)
                    {
                        SearchOne(index);
                    }
                }
                );
            int rangeFrom = workSize * i;
            int rangeTo = workSize * (i + 1);
            thread.Start(new int[]{ rangeFrom, rangeTo });
            threads.Add(thread);
        }
        foreach (var t in threads) t.Join();
    }

    static void ParallelThreadSearch()
    {
        System.Threading.Tasks.Parallel.For (0, searchDataSize, 
                index =>
                {
                    SearchOne(index);
                });
    }

    static void Main(string[] args)
    {
        Console.Write("Generatic data...  ");
        GenerateRandomData(inputData);
        GenerateRandomData(searchData);
        Console.WriteLine("Done.");
        Console.WriteLine();

        Stopwatch watch = new Stopwatch();

        Console.Write("All thread searching... ");
        watch.Restart();
        AllThreadSearch();
        watch.Stop();
        Console.WriteLine($"Done in {watch.ElapsedMilliseconds} ms.");

        Console.Write("Few thread searching... ");
        watch.Restart();
        FewThreadSearch();
        watch.Stop();
        Console.WriteLine($"Done in {watch.ElapsedMilliseconds} ms.");

        Console.Write("Parallel thread searching... ");
        watch.Restart();
        ParallelThreadSearch();
        watch.Stop();
        Console.WriteLine($"Done in {watch.ElapsedMilliseconds} ms.");

        Console.WriteLine();
        Console.WriteLine("Press ENTER to quit.");
        Console.ReadLine();
    }
}

编辑:请确保在调试器之外运行应用程序。VS Debugger 减慢了多线程的情况。


编辑2:更多测试。

为了清楚起见,这里是更新的代码,可以保证我们一次运行 1000 个:

public static void AllThreadSearch()
{
    ManualResetEvent startEvent = new ManualResetEvent(false);
    List<Thread> threads = new List<Thread>();
    for (int i = 0; i < searchDataSize; i++)
    {
        var thread = new Thread(
        obj =>
        {
            startEvent.WaitOne();
            int index = (int)obj;
            SearchOne(index);
        });
        thread.Start(i);
        threads.Add(thread);
    }
    startEvent.Set();
    foreach (var t in threads) t.Join();
}

使用较小的数组进行测试 - 100K 元素,结果是:

1000 与 8 线程

               Method |     Mean |    Error |    StdDev | Scaled |
--------------------- |---------:|---------:|----------:|-------:|
      AllThreadSearch | 323.0 ms | 7.307 ms | 21.546 ms |   1.00 |
      FewThreadSearch | 164.9 ms | 3.311 ms |  5.251 ms |   1.00 |
 ParallelThreadSearch | 141.3 ms | 1.503 ms |  1.406 ms |   1.00 |

现在,正如预期的那样,1000 个线程要慢得多。Parallel.For 仍然胜过所有这些,这也是合乎逻辑的。

然而,将数组增加到 500K(即每个线程所做的工作量),事情开始看起来很奇怪:

1000 与 8、500K

               Method |     Mean |    Error |   StdDev | Scaled |
--------------------- |---------:|---------:|---------:|-------:|
      AllThreadSearch | 890.9 ms | 17.74 ms | 30.61 ms |   1.00 |
      FewThreadSearch | 712.0 ms | 13.97 ms | 20.91 ms |   1.00 |
 ParallelThreadSearch | 714.5 ms | 13.75 ms | 12.19 ms |   1.00 |

看起来上下文切换的成本可以忽略不计。线程创建成本也相对较小。线程过多的唯一显着成本是内存(内存地址)丢失。仅此一项,就已经够糟糕了。

现在,线程创建成本真的那么低吗?我们普遍被告知创建线程是非常糟糕的,上下文切换是邪恶的。

标签: c#.netmultithreading

解决方案


您可能需要考虑应用程序如何访问内存。在最大线程情况下,您可以有效地按顺序访问内存,从缓存的角度来看这是有效的。使用少量线程的方法更加随机,导致缓存未命中。根据 CPU 的不同,有一些性能计数器可以让您测量 L1 和 L2 缓存命中/未命中。


推荐阅读