首页 > 解决方案 > 线程组内存写入然后读取无障碍

问题描述

我正在查看 Metal 文档(此处链接)中“为计算处理选择设备对象”页面上链接的项目。在那里,我注意到我希望在我自己的粒子模拟器中采用的线程组内存的巧妙使用。但是,在我这样做之前,我需要了解线程组内存的特定方面以及开发人员在这种情况下正在做什么。

该代码包含一个像这样的段:

// In AAPLKernels.metal

// Parameter of the kernel
threadgroup float4* sharedPosition [[ threadgroup(0)]]

// Body
   ...

    // For each particle / body
    for(i = 0; i < params.numBodies; i += numThreadsInGroup)
    {
        // Because sharedPosition uses the threadgroup address space, 'numThreadsInGroup' elements
        // of sharedPosition will be initialized at once (not just one element at lid as it
        // may look like)
        sharedPosition[threadInGroup] = oldPosition[sourcePosition];

        j = 0;

        while(j < numThreadsInGroup)
        {
            acceleration += computeAcceleration(sharedPosition[j++], currentPosition, softeningSqr);
            acceleration += computeAcceleration(sharedPosition[j++], currentPosition, softeningSqr);
            acceleration += computeAcceleration(sharedPosition[j++], currentPosition, softeningSqr);
            acceleration += computeAcceleration(sharedPosition[j++], currentPosition, softeningSqr);
            acceleration += computeAcceleration(sharedPosition[j++], currentPosition, softeningSqr);
            acceleration += computeAcceleration(sharedPosition[j++], currentPosition, softeningSqr);
            acceleration += computeAcceleration(sharedPosition[j++], currentPosition, softeningSqr);
            acceleration += computeAcceleration(sharedPosition[j++], currentPosition, softeningSqr);
        } // while

        sourcePosition += numThreadsInGroup;
    } // for

特别是,在分配sharedPosition“因为......”开始之前的评论让我感到困惑。我还没有读到线程组内存写入同时发生在同一线程组中的所有线程上的任何地方;事实上,我认为在再次从共享内存池读取之前需要一个屏障以避免未定义的行为,因为每个线程随后在分配之后从整个线程组内存池中读取(分配当然是写入)。为什么这里不需要屏障?

标签: swiftgpgpumetal

解决方案


推荐阅读