c++ - 多核上的 MPI 计算比单核上的错误
问题描述
我是 MPI 的新手,并将其作为大学课程学习。任务是使用和以数字方式找到 const e的值。我找到的唯一合适的方法是MPI_Send()
MPI_Recv()
我在 2、3 和 4 核上运行它,但在 1 核上得到错误的数字,一切都很好。这是我的代码:
#include <iostream>
#include <fstream>
#include <cmath>
#include "mpi.h"
using namespace std;
const int n = 1e04;
double start_time, _time;
int w_size, w_rank, name_len;
char cpu_name[MPI_MAX_PROCESSOR_NAME];
ofstream fout("exp_result", std::ios_base::app | std::ios_base::out);
long double factorial(int num){
if (num < 1)
return 1;
else
return num * factorial(num - 1);
}
void e_finder(){
long double sum = 0.0, e = 0.0;
if(w_rank == 0)
start_time = MPI_Wtime();
for(int i = 0; i < n; i+=w_size)
sum += 1.0 / factorial(i);
MPI_Send(&sum, 1, MPI_LONG_DOUBLE, 0, 0, MPI_COMM_WORLD);
if(w_rank == 0){
// e += sum;
for (int i = 0; i < w_size; i++){
MPI_Recv(&sum, 1, MPI_LONG_DOUBLE, i, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
e += sum;
}
_time = MPI_Wtime() - start_time;
cout.precision(29);
cout << "e = "<< e << endl << fixed << "error is " << abs(e - M_E) << endl;
cout.precision(9);
cout << "\nwall clock time = " <<_time << " sec\n";
fout << w_size << "\t" << _time << endl;
}
}
int main(int argc, char const *argv[]) {
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &w_size);
MPI_Comm_rank(MPI_COMM_WORLD, &w_rank);
MPI_Get_processor_name(cpu_name, &name_len);
cout<<"calculations started on cpu:" << w_rank << "!\n";
MPI_Barrier(MPI_COMM_WORLD);
e_finder();
MPI_Finalize();
fout.close();
return 0;
}
有人可以帮我找出并掌握错误吗?以下是输出:
$ mpirun -np 1 ./exp1
calculations started on cpu:0!
e = 2.718281828459045235428168108
error is 0.00000000000000014463256980957
wall clock time = 4.370553009 sec
$ mpirun -np 2 ./exp1
calculations started on cpu:0!
calculations started on cpu:1!
e = 3.0861612696304875570925407846
error is 0.36787944117144246629694248618
wall clock time = 2.449338411 sec
$ mpirun -np 3 ./exp1
calculations started on cpu:0!
calculations started on cpu:1!
calculations started on cpu:2!
e = 3.5041749401277555767651727958
error is 0.78589311166871048596957449739
wall clock time = 2.011082204 sec
$ mpirun -np 4 ./exp1
calculations started on cpu:0!
calculations started on cpu:3!
calculations started on cpu:1!
calculations started on cpu:2!
e = 4.1667658813667669917037150729
error is 1.44848405290772190090811677443
wall clock time = 1.617427335 sec
解决方案
问题在于你如何分配工作。似乎您希望每个程序都计算一部分分数。但是,它们都是从第一个分数开始,然后计算每一个w_size
分数。这会导致某些分数被计算多次,而有些则根本不会被计算。这应该通过更改行来解决
for(int i = 0; i < n; i+=w_size)
至
for(int i = w_rank; i < n; i+=w_size)
这使得每个程序从不同的分数开始,并且由于它们正在计算每一个w_size
分数,因此计算的分数之间不应再有任何冲突。
推荐阅读
- html - POS 打印机的自定义页面大小
- api - 用于更新一对一关系的 API 端点
- python - 类型错误:& 不支持的操作数类型:“列表”和“列表”但代码在我的 Jupyter Notebook 中有效
- selenium - 如何为执行相同测试方法的每个测试用例动态使用@Description(在诱惑报告中)
- android - Android Worker 有时不发送通知
- json - 如何删除 json 中的 \n、\r、\t
- sql - 具有(可避免的)多个 nm 关系的数据库?
- mysql - 通过 GraphQL 获取数据
- javascript - 验证注册字段奇怪的问题
- javascript - jquery通过滚动控制动画