首页 > 解决方案 > 如何在没有竞争条件和虚假共享的情况下使用 OpenMP 并行化此功能?

问题描述

我需要在没有竞争条件或错误共享的情况下并行化一个函数。我尝试了很多方法,但我还无法实现。功能是:

__inline static
void calculateClusterCentroIDs(int numCoords, int numObjs, int numClusters, float * dataSetMatrix, int * clusterAssignmentCurrent, float *clustersCentroID) {
    int * clusterMemberCount = (int *) calloc (numClusters,sizeof(float));

    // sum all points
    // for every point
    for (int i = 0; i < numObjs; ++i) {
        // which cluster is it in?
        int activeCluster = clusterAssignmentCurrent[i];

        // update count of members in that cluster
        ++clusterMemberCount[activeCluster];

        // sum point coordinates for finding centroid
        for (int j = 0; j < numCoords; ++j)
            clustersCentroID[activeCluster*numCoords + j] += dataSetMatrix[i*numCoords + j];
    }


    // now divide each coordinate sum by number of members to find mean/centroid
    // for each cluster
    for (int i = 0; i < numClusters; ++i) {
        if (clusterMemberCount[i] != 0)
            // for each coordinate
            for (int j = 0; j < numCoords; ++j)
                clustersCentroID[i*numCoords + j] /= clusterMemberCount[i];  /// XXXX will divide by zero here for any empty clusters!
    }

知道我怎么能做到这一点吗?

谢谢你。

标签: cperformanceparallel-processingopenmp

解决方案


这很简单

// sum all points
// for every point
for (int i = 0; i < numObjs; ++i) {
    // which cluster is it in?
    int activeCluster = clusterAssignmentCurrent[i];

    // update count of members in that cluster
    ++clusterMemberCount[activeCluster];

    // sum point coordinates for finding centroid
    #pragma omp parallel for
    for (int j = 0; j < numCoords; ++j)
        clustersCentroID[activeCluster*numCoords + j] += dataSetMatrix[i*numCoords + j];
}

内部循环非常适合并行化,因为所有写入都发生在clustersCentroID. 您可以放心地假设默认计划不会表现出明显的错误共享,它通常具有足够大的块。只是不要尝试类似的东西schedule(static,1)

外部循环并不那么容易并行化。clusterMemberCount您可以在and上使用缩减clusterMemberCount,或者执行以下操作:

#pragma omp parallel // note NO for
for (int i = 0; i < numObjs; ++i) {
    int activeCluster = clusterAssignmentCurrent[i];
    // ensure that exactly one thread works on each cluster
    if (activeCluster % omp_num_threads() != omp_get_thread_num()) continue;

只有在简单的解决方案不能产生足够的性能时才这样做。

另一个循环也很简单

#pragma omp parallel for
for (int i = 0; i < numClusters; ++i) {
    if (clusterMemberCount[i] != 0)
        // for each coordinate
        for (int j = 0; j < numCoords; ++j)
            clustersCentroID[i*numCoords + j] /= clusterMemberCount[i];
}

同样,数据访问在正确性和错误共享方面都是完全隔离的,除了边缘情况。


推荐阅读