c++ - 嵌套循环中未正确忽略内部循环的 Pragma omp parallel
问题描述
我正在尝试实现以下代码,以查看如何通过嵌套循环管理 OpenMP 线程,其中每个内部/外部循环在函数及其调用者中单独实现。
每个循环都是用语句实现的
#pragma omp parallel for
,我假设pragma
for 内部循环被忽略。
为了看到这一点,我在每个循环中打印了线程号。
然后,我可以看到以下内容,其中内部循环中的线程 id 始终为零,与调用者对应的线程号不同。为什么会这样?
Calling 0 from 0
Calling 2 from 1
Calling 6 from 4
Calling 8 from 6
Calling 4 from 2
Calling 7 from 5
Calling 5 from 3
Calling 0 from 0 // Expecting 3
Calling 1 from 0
Calling 2 from 0
Calling 3 from 0
Calling 0 from 0
Calling 1 from 0
Calling 2 from 0
Calling 3 from 0
Calling 0 from 0
Calling 0 from 0
Calling 0 from 0
Calling 1 from 0
Calling 2 from 0
Calling 3 from 0
Calling 9 from 7
Calling 1 from 0 // Expecting 7
Calling 2 from 0
Calling 3 from 0
Calling 0 from 0
Calling 3 from 1
Calling 0 from 0 // Expecting 1
Calling 1 from 0
Calling 2 from 0
Calling 1 from 0
Calling 2 from 0
Calling 3 from 0
Calling 3 from 0
Calling 1 from 0
Calling 2 from 0
Calling 3 from 0
Calling 0 from 0
Calling 1 from 0
Calling 2 from 0
Calling 3 from 0
Calling 0 from 0
Calling 1 from 0
Calling 2 from 0
Calling 3 from 0
Calling 1 from 0
Calling 0 from 0
Calling 1 from 0
Calling 2 from 0
Calling 3 from 0
#include <vector>
#include <omp.h>
#include <iostream>
#include <cstdio>
#include <limits>
#include <cstdint>
#include <cinttypes>
using namespace std;
const size_t kM = 4;
struct Mat
{
int elem[kM];
Mat(const Mat& copy)
{
for (size_t i = 0; i<kM; ++i)
this->elem[i] = copy.elem[i];
}
Mat()
{
for (size_t i = 0; i<kM; ++i)
elem[i] = 0;
}
void do_mat(Mat& m)
{
#pragma omp parallel for
for (int i = 0; i<kM; ++i)
{
printf(" \tCalling %d from %d\n", i, omp_get_thread_num());
elem[i] += m.elem[i];
}
}
};
int main ()
{
const int kN = 10;
vector<Mat> matrices(kN);
Mat m;
#pragma omp parallel for
for (int i = 0; i < kN; i++)
{
int tid = omp_get_thread_num();
printf("Calling %d from %d\n", i, tid);
matrices[i].do_mat(m);
}
return 0;
}
解决方案
我不确定我是否理解您的预期,但您得到的结果是完全可以预期的。
默认情况下,OpenMP 嵌套并行性被禁用,这意味着任何嵌套parallel
区域将创建与遇到它们的外部级别的线程一样多的 1 个线程组。
在您的情况下,您最外面parallel
的区域创建了一个由 8 个线程组成的团队。这些中的每一个都将到达最里面的parallel
区域,并创建一个二级 1 线程团队。这些二级线程中的每一个都在其自己的团队中排名为 0,因此您拥有的打印为 0。
使用 g++ 9.3.0 编译的相同代码,通过设置 2 个环境变量OMP_NUM_THREADS
和OMP_NESTED
,我得到以下信息:
OMP_NUM_THREADS="2,3" OMP_NESTED=true ./a.out
Calling 0 from 0
Calling 5 from 1
Calling 0 from 0
Calling 1 from 0
Calling 2 from 1
Calling 0 from 0
Calling 1 from 0
Calling 3 from 2
Calling 3 from 2
Calling 2 from 1
Calling 6 from 1
Calling 1 from 0
Calling 0 from 0
Calling 1 from 0
Calling 3 from 2
Calling 2 from 1
Calling 2 from 0
Calling 0 from 0
Calling 1 from 0
Calling 2 from 1
Calling 3 from 2
Calling 0 from 0
Calling 1 from 0
Calling 3 from 2
Calling 2 from 1
Calling 3 from 0
Calling 7 from 1
Calling 0 from 0
Calling 3 from 2
Calling 2 from 1
Calling 3 from 2
Calling 0 from 0
Calling 1 from 0
Calling 1 from 0
Calling 2 from 1
Calling 4 from 0
Calling 8 from 1
Calling 0 from 0
Calling 3 from 2
Calling 2 from 1
Calling 2 from 1
Calling 0 from 0
Calling 1 from 0
Calling 3 from 2
Calling 1 from 0
Calling 9 from 1
Calling 2 from 1
Calling 0 from 0
Calling 1 from 0
Calling 3 from 2
也许这更符合您的预期?
推荐阅读
- c# - C#字节数组转换成变量
- python - Django - queryset.union 返回损坏的查询集:filter() 和 get() 返回所有内容
- ios - 在 xcode 中构建归档后组织归档为空
- javascript - angularjs手风琴仅打开时如何调用函数
- java - 无法解析“:app@debug/compileClasspath”的依赖关系:gradle 同步时无法解析 com.google.android.things:androidthings:1.0 错误
- asp.net-mvc - 如何在不同区域使用具有区域和相同控制器名称的 Web API?
- ios - 检查 RTMP 和非 RTMP url 的代码?
- javascript - 如何导入字符串值(x、y、z)以在 three.js 中生成几何图形?
- c# - 列表中的 LINQ 多个分组依据
然后转换为列表 - sql - 如何将大表导出/假脱机到 Oracle 中的文件