首页 > 解决方案 > 将数组元素相乘在 C# 中具有意想不到的性能

问题描述

我想找到将两个数组元素相乘的最佳方法。这是一个更广泛的项目的一部分,其中性能但不是唯一的考虑因素。

我今天开始用 C# (Linqpad) 编写一些函数,因此它没有以任何方式进行优化。下面代码的输出如下:

Environment.ProcessorCount: 4
Vector<double>.Count: 4

For sequential: 129ms, sum: 2.30619276241231E+25
Plinq: 344ms, sum: 2.30619276241231E+25
Parallel.For: 137ms, 2.30619276241231E+25
Simd sequential: 100ms, sum: 2.30619276241231E+25
Simd parallel: 761ms

这包括乘法的执行时间和作为检查结果的总和。这里有一些奇怪的结果(我对 C# 有点生疏,所以它很可能是我的代码):

我的代码如下 - 有对 Nuget System.Numerics.Vector 包的引用。我将不胜感激任何意见、建议、更正或替代方案......

using System.Threading.Tasks;
using System.Numerics;
using System.Collections.Concurrent;

void Main()
{
    var random = new Random();

    var arraySize = 20_000_001;

    var x = new double[arraySize];
    var y = new double[arraySize];

    for (var i = 0; i < x.Length; ++i)
    {
        x[i] = random.Next();
        y[i] = random.Next();
    }

    Console.WriteLine($"Environment.ProcessorCount: {Environment.ProcessorCount}");
    Console.WriteLine($"Vector<double>.Count: {Vector<double>.Count}\n");

    MultiplyFor(x, y);
    MultiplyPlinq(x, y);
    MultiplyParallelFor(x, y);
    MultiplySIMD(x, y);
    MultiplyParallelSIMD(x, y);

}

void MultiplyPlinq(double[] x, double[] y)
{
    var result = new double[x.Length];

    var sw = new Stopwatch();

    sw.Start();

    result = ParallelEnumerable.Range(0, x.Length).Select(i => x[i] * y[i]).ToArray();

    sw.Stop();

    Console.WriteLine($"Plinq: {sw.ElapsedMilliseconds}ms, sum: {SumCheck(result)}");
}

double SumCheck(double[] x)
{
    return Math.Round(x.Sum() , 4);
}

void MultiplyFor(double[] x, double[] y)
{
    var result = new double[x.Length];

    var sw = new Stopwatch();

    sw.Start();

    for (var i = 0; i < x.Length; ++i)
    {
        result[i] = x[i] * y[i];
    }

    sw.Stop();

    Console.WriteLine($"For sequential: {sw.ElapsedMilliseconds}ms, sum: {SumCheck(result)}");

}

void MultiplyParallelFor(double[] x, double[] y)
{
    var result = new double[x.Length];

    var sw = new Stopwatch();

    sw.Start();

    Parallel.For(0, x.Length, i =>
    {
        result[i] = x[i] * y[i];
    });

    sw.Stop();

    Console.WriteLine($"Parallel.For: {sw.ElapsedMilliseconds}ms, {SumCheck(result)}");

}

void MultiplySIMD(double[] x, double[] y)
{
    var sw = new Stopwatch();

    sw.Start();

    var result = MultiplyByVectors(x, y);

    sw.Stop();

    // 2 cores, 4 logical, 256b register
    Console.WriteLine($"Simd sequential: {sw.ElapsedMilliseconds}ms, sum: {SumCheck(result)}");
}

double[] MultiplyByVectors(double[] x, double[] y)
{
    var result = new double[x.Length];

    var vectorSize = Vector<double>.Count;

    int i;

    for (i = 0; i < x.Length - vectorSize; i += vectorSize)
    {
        var vx = new Vector<double>(x, i);
        var vy = new Vector<double>(y, i);
        (vx * vy).CopyTo(result, i);
    }

    for (; i < x.Length; i++)
    {
        result[i] = x[i] * y[i];
    }

    return result;
}

void MultiplyParallelSIMD(double[] x, double[] y)
{
    var sw = new Stopwatch();

    sw.Start();

    var chunkSize = (int)(x.Length / Environment.ProcessorCount);

    Parallel.For(0, Environment.ProcessorCount, i => {

        var complete = i * chunkSize;
        var take = Math.Min(chunkSize, x.Length - i * chunkSize);
        var xSegment = x.Skip((int)complete).Take((int)take);
        var ySegment = y.Skip((int)complete).Take((int)take);
        var result = MultiplyByVectors(xSegment.ToArray(), ySegment.ToArray());

    });

    sw.Stop();

    Console.WriteLine($"Simd parallel: {sw.ElapsedMilliseconds}ms");

}

标签: c#arraystask-parallel-librarysimdparallel.for

解决方案


Parallel.For简单的形式不适合非常精细的工作负载,因为在每个循环上调用匿名函数的开销抵消了并行性的好处(匿名函数不能内联)。解决的办法是对数据进行分区,让多个分区并行处理,而每个分区用快速直接循环处理:

Parallel.ForEach(Partitioner.Create(0, x.Length), range =>
{
    for (int i = range.Item1; i < range.Item2; i++)
    {
        result[i] = x[i] * y[i];
    }
});

当前实现Partitioner中的内置函数创建的分区数量与 CPU 内核的数量 x 3 一样多。

关于并行化 SIMD 操作,在我自己的实验中,我没有观察到我的 PC 的性能有显着提升。我的理论是(这只是一个疯狂的猜测,而不是有根据的猜测),SIMD 计算发生得如此之快,以至于 RAM 无法跟上 CPU 处理数据的速度。


推荐阅读