首页 > 解决方案 > 嵌套循环中未正确忽略内部循环的 Pragma omp parallel

问题描述

我正在尝试实现以下代码,以查看如何通过嵌套循环管理 OpenMP 线程,其中每个内部/外部循环在函数及其调用者中单独实现。

每个循环都是用语句实现的 #pragma omp parallel for,我假设pragmafor 内部循环被忽略。

为了看到这一点,我在每个循环中打印了线程号。

然后,我可以看到以下内容,其中内部循环中的线程 id 始终为零,与调用者对应的线程号不同。为什么会这样?

Calling 0 from 0
Calling 2 from 1
Calling 6 from 4
Calling 8 from 6
Calling 4 from 2
Calling 7 from 5
Calling 5 from 3
    Calling 0 from 0  // Expecting 3
    Calling 1 from 0
    Calling 2 from 0
    Calling 3 from 0
    Calling 0 from 0
    Calling 1 from 0
    Calling 2 from 0
    Calling 3 from 0
    Calling 0 from 0
    Calling 0 from 0
    Calling 0 from 0
    Calling 1 from 0
    Calling 2 from 0
    Calling 3 from 0
Calling 9 from 7
    Calling 1 from 0 // Expecting 7
    Calling 2 from 0
    Calling 3 from 0
    Calling 0 from 0
Calling 3 from 1
    Calling 0 from 0 // Expecting 1
    Calling 1 from 0
    Calling 2 from 0
    Calling 1 from 0
    Calling 2 from 0
    Calling 3 from 0
    Calling 3 from 0
    Calling 1 from 0
    Calling 2 from 0
    Calling 3 from 0
    Calling 0 from 0
    Calling 1 from 0
    Calling 2 from 0
    Calling 3 from 0
    Calling 0 from 0
    Calling 1 from 0
    Calling 2 from 0
    Calling 3 from 0
Calling 1 from 0
    Calling 0 from 0
    Calling 1 from 0
    Calling 2 from 0
    Calling 3 from 0
#include <vector>                                                                                                                                                                                                                                                          
#include <omp.h>
#include <iostream>
#include <cstdio>
#include <limits>
#include <cstdint>
#include <cinttypes>

using namespace std;

const size_t  kM = 4;

struct Mat
{
 int elem[kM];

 Mat(const Mat& copy)
 {
  for (size_t i = 0; i<kM; ++i)
   this->elem[i] = copy.elem[i];
 }
 Mat()
 {
  for (size_t i = 0; i<kM; ++i)
    elem[i] = 0;
 }

 void do_mat(Mat& m)
 {
  #pragma omp parallel for
  for (int i = 0; i<kM; ++i)
  {
    printf(" \tCalling %d from %d\n", i, omp_get_thread_num());
    elem[i] += m.elem[i];
  }
 }
};

int main ()
{
  const int kN = 10;
  vector<Mat> matrices(kN);

  Mat m;
  #pragma omp parallel for
  for (int i = 0; i < kN; i++)
  {
    int tid = omp_get_thread_num();
    printf("Calling %d from %d\n", i, tid);
    matrices[i].do_mat(m);
  }

  return 0;
}          

标签: c++multithreadingopenmp

解决方案


我不确定我是否理解您的预期,但您得到的结果是完全可以预期的。

默认情况下,OpenMP 嵌套并行性被禁用,这意味着任何嵌套parallel区域将创建与遇到它们的外部级别的线程一样多的 1 个线程组。

在您的情况下,您最外面parallel的区域创建了一个由 8 个线程组成的团队。这些中的每一个都将到达最里面的parallel区域,并创建一个二级 1 线程团队。这些二级线程中的每一个都在其自己的团队中排名为 0,因此您拥有的打印为 0。

使用 g++ 9.3.0 编译的相同代码,通过设置 2 个环境变量OMP_NUM_THREADSOMP_NESTED,我得到以下信息:

OMP_NUM_THREADS="2,3" OMP_NESTED=true ./a.out 
Calling 0 from 0
Calling 5 from 1
    Calling 0 from 0
    Calling 1 from 0
    Calling 2 from 1
    Calling 0 from 0
    Calling 1 from 0
    Calling 3 from 2
    Calling 3 from 2
    Calling 2 from 1
Calling 6 from 1
Calling 1 from 0
    Calling 0 from 0
    Calling 1 from 0
    Calling 3 from 2
    Calling 2 from 1
Calling 2 from 0
    Calling 0 from 0
    Calling 1 from 0
    Calling 2 from 1
    Calling 3 from 2
    Calling 0 from 0
    Calling 1 from 0
    Calling 3 from 2
    Calling 2 from 1
Calling 3 from 0
Calling 7 from 1
    Calling 0 from 0
    Calling 3 from 2
    Calling 2 from 1
    Calling 3 from 2
    Calling 0 from 0
    Calling 1 from 0
    Calling 1 from 0
    Calling 2 from 1
Calling 4 from 0
Calling 8 from 1
    Calling 0 from 0
    Calling 3 from 2
    Calling 2 from 1
    Calling 2 from 1
    Calling 0 from 0
    Calling 1 from 0
    Calling 3 from 2
    Calling 1 from 0
Calling 9 from 1
    Calling 2 from 1
    Calling 0 from 0
    Calling 1 from 0
    Calling 3 from 2

也许这更符合您的预期?


推荐阅读