c - MPI_Reduce 遇到问题
问题描述
这是我在网站上的第一篇文章,多年来我一直在阅读它,它总是帮助我修复我的代码。
目前,我正在尝试创建一个 MPI 程序,将 MPI_WORLD_COMM 分成两个相等的部分(comm1 和 comm2)。
在 comm1 中,我使用 MPI_Scatter() 在所有进程之间拆分一个向量,计算每个进程的平均值,并使用 MPI_Reduce() 计算整个平均值。
之后,使用 MPI_Bcast(),我将计算的平均值发送到 MPI_WORLD_COMM 中的所有进程。
在 comm2 中,我使用 MPI_Scatter() 在所有进程中拆分相同的向量,计算每个进程的方差,并使用 MPI_Reduce() 计算整个方差。
每个通信器的主进程显示计算值。
我目前的问题是每个进程都成功计算了它的平均值,但是 MPI_Reduce() 忽略了第一个进程的值。同样的事情variance.
我也得到了一些段错误,我认为这与我的错误有关。
我的代码(我在计算方差时手动修正了平均值以确保发生相同的“错误”):
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <mpi.h>
#define N 100
double compute_mean (double* v, int count);
double compute_var (double *v, double mean, int count);
int main (int argc, char **argv)
{
MPI_Comm comm1, comm2;
MPI_Group world_group, group1, group2;
int i, my_rank, new_rank = -1, nproc;
double mean, sigma, local_sum,*data,*local_data;
int *ranks1, *ranks2;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&nproc);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_group(MPI_COMM_WORLD, &world_group);
ranks1 = (int*) malloc ((nproc/2)*sizeof(int));
ranks2 = (int*) malloc ((nproc/2)*sizeof(int));
for(i = 0; i < nproc/2; i++)
{
ranks1[i] = i;
ranks2[i] = nproc/2 + i;
}
MPI_Group_incl(world_group, nproc/2, ranks1, &group1);
MPI_Group_incl(world_group, nproc/2, ranks2, &group2);
MPI_Comm_create(MPI_COMM_WORLD, group1, &comm1);
MPI_Comm_create(MPI_COMM_WORLD, group2, &comm2);
if(comm1 != MPI_COMM_NULL)
MPI_Comm_rank(comm1, &new_rank);
if(comm2 != MPI_COMM_NULL)
MPI_Comm_rank(comm2, &new_rank);
if(new_rank == 0)
{
printf("Reading file - process %d\n", my_rank);
data = (double*) malloc (N * sizeof(double));
FILE *f = fopen("input_vec.dat","r");
for(i = 0; i < N; i++){
fscanf(f, "%lf\n", &data[i]);
}
fclose(f);
free(f);
}
local_data = (double*) malloc ((N/(nproc/2))*sizeof(double));
local_sum = 0;
mean = 0;
if(comm1 != MPI_COMM_NULL){
MPI_Scatter(data,N/(nproc/2),MPI_DOUBLE,local_data,N/(nproc/2),MPI_DOUBLE,0,comm1);
}
if(comm2 != MPI_COMM_NULL){
MPI_Scatter(data,N/(nproc/2),MPI_DOUBLE,local_data,N/(nproc/2),MPI_DOUBLE,0,comm2);
}
if(comm1 != MPI_COMM_NULL)
{
local_sum = compute_mean(local_data,N/(nproc/2));
printf("(comm1) - Local mean for process %d is: %f\n", new_rank, local_sum);
MPI_Reduce(&local_sum, &mean, nproc/2, MPI_DOUBLE, MPI_SUM, 0, comm1);
}
MPI_Bcast(&mean,1,MPI_DOUBLE,0,MPI_COMM_WORLD);
if(comm1 != MPI_COMM_NULL && new_rank == 0)
{
printf("(comm1) - The mean is: %f\n",mean);
}
if(comm2 != MPI_COMM_NULL)
{
local_sum = compute_var(local_data, 0.091529, N/(nproc/2));
printf("(comm2) - Local variance for process %d is: %f\n", new_rank, local_sum);
MPI_Reduce(&local_sum,&sigma,nproc/2,MPI_DOUBLE, MPI_SUM, 0, comm2);
}
if(comm2 != MPI_COMM_NULL && new_rank == 0)
{
printf("(comm2) - The variance is: %f\n", sigma);
}
free(ranks1);
free(ranks2);
if (new_rank == 0) free(data);
free(local_data);
//MPI_Comm_free(&comm1);
//MPI_Comm_free(&comm2);
MPI_Finalize();
return 0;
}
double compute_mean(double* v, int count)
{
int i;
double total = 0;
for(i = 0; i < count; i++)
total += v[i];
total /= N;
return total;
}
double compute_var(double* v, double mean, int count)
{
int i;
double total = 0;
for(i = 0; i < count; i++){
double diff = v[i] - mean;
total += diff*diff;
}
total /= N-1;
return total;
}
输出(重新排列以便更容易看到 - 我认为这不重要?):
Reading file - process 0
(comm1) - Local mean for process 0 is: 0.019051
(comm1) - Local mean for process 1 is: 0.021419
(comm1) - Local mean for process 2 is: 0.024029
(comm1) - Local mean for process 3 is: 0.027030
(comm1) - The mean is: 0.072478 (EDIT ---> correct is 0.091529)
Reading file - process 4
(comm2) - Local variance for process 0 is: 0.000061
(comm2) - Local variance for process 1 is: 0.000011
(comm2) - Local variance for process 2 is: 0.000008
(comm2) - Local variance for process 3 is: 0.000073
(comm2) - The variance is: 0.000092 (EDIT ---> correct is 0.000153)
[DESKTOP-FVPJMLQ:02619] *** Process received signal ***
[DESKTOP-FVPJMLQ:02619] Signal: Segmentation fault (11)
[DESKTOP-FVPJMLQ:02619] Signal code: (128)
[DESKTOP-FVPJMLQ:02619] Failing at address: (nil)
[DESKTOP-FVPJMLQ:02629] *** Process received signal ***
[DESKTOP-FVPJMLQ:02629] Signal: Segmentation fault (11)
[DESKTOP-FVPJMLQ:02629] Signal code: (128)
[DESKTOP-FVPJMLQ:02629] Failing at address: (nil)
[DESKTOP-FVPJMLQ:02619] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7ffe1ad3ef20]
[DESKTOP-FVPJMLQ:02619] [ 1] [DESKTOP-FVPJMLQ:02629] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7fceea13ef20]
[DESKTOP-FVPJMLQ:02629] [ 1] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x3d)/lib/x86_64-linux-gnu/libc.so.6(cfree+0x3d)[0x7ffe1ad9798d]
[DESKTOP-FVPJMLQ:02619] [ 2] ex2(+0x1309)[0x7ffe1b801309]
[DESKTOP-FVPJMLQ:02619] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7ffe1ad21b97]
[DESKTOP-FVPJMLQ:02619] [ 4] ex2(+0xc5a)[0x7ffe1b800c5a]
[DESKTOP-FVPJMLQ:02619] *** End of error message ***
[0x7fceea19798d]
[DESKTOP-FVPJMLQ:02629] [ 2] ex2(+0x12ea)[0x7fceeae012ea]
[DESKTOP-FVPJMLQ:02629] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7fceea121b97]
[DESKTOP-FVPJMLQ:02629] [ 4] ex2(+0xc5a)[0x7fceeae00c5a]
[DESKTOP-FVPJMLQ:02629] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node DESKTOP-FVPJMLQ exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
[DESKTOP-FVPJMLQ:02614] 7 more processes have sent help message help-btl-vader.txt / cma-permission-denied
[DESKTOP-FVPJMLQ:02614] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
使用串行程序,正确的输出将是:
Mean: 0.091529
Variance: 0.000153
为了完整起见,我将附上文件input_vec.dat
正如你所看到的,它在均值和方差上都令我失望,我真的不知道为什么。