首页 > 解决方案 > MPI_Reduce 遇到问题

问题描述

这是我在网站上的第一篇文章,多年来我一直在阅读它,它总是帮助我修复我的代码。

目前,我正在尝试创建一个 MPI 程序,将 MPI_WORLD_COMM 分成两个相等的部分(comm1 和 comm2)。

在 comm1 中,我使用 MPI_Scatter() 在所有进程之间拆分一个向量,计算每个进程的平均值,并使用 MPI_Reduce() 计算整个平均值。

之后,使用 MPI_Bcast(),我将计算的平均值发送到 MPI_WORLD_COMM 中的所有进程。

在 comm2 中,我使用 MPI_Scatter() 在所有进程中拆分相同的向量,计算每个进程的方差,并使用 MPI_Reduce() 计算整个方差。

每个通信器的主进程显示计算值。


我目前的问题是每个进程都成功计算了它的平均值,但是 MPI_Reduce() 忽略了第一个进程的值。同样的事情variance.

我也得到了一些段错误,我认为这与我的错误有关。


我的代码(我在计算方差时手动修正了平均值以确保发生相同的“错误”):


    #include <stdlib.h>
    #include <stdio.h>
    #include <time.h>
    
    #include <mpi.h>
    
    #define N 100
    
    
    double compute_mean (double* v, int count);
    double compute_var  (double *v, double mean, int count);
    
    
    int main (int argc, char **argv)
    {
      MPI_Comm comm1, comm2;
      MPI_Group world_group, group1, group2;    
      int i, my_rank, new_rank = -1, nproc;
      
      double mean, sigma, local_sum,*data,*local_data;  
    
      int *ranks1, *ranks2;
      
    
      MPI_Init(&argc,&argv);
      MPI_Comm_size(MPI_COMM_WORLD,&nproc);
      MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
      
    
      MPI_Comm_group(MPI_COMM_WORLD, &world_group);
    
      ranks1 = (int*) malloc ((nproc/2)*sizeof(int));
      ranks2 = (int*) malloc ((nproc/2)*sizeof(int));
      
      for(i = 0; i < nproc/2; i++)
      {
          ranks1[i] = i;
          ranks2[i] = nproc/2 + i;
      }
    
      MPI_Group_incl(world_group, nproc/2, ranks1, &group1);
      MPI_Group_incl(world_group, nproc/2, ranks2, &group2);
        
      MPI_Comm_create(MPI_COMM_WORLD, group1, &comm1);
      MPI_Comm_create(MPI_COMM_WORLD, group2, &comm2);
    
      if(comm1 != MPI_COMM_NULL)
        MPI_Comm_rank(comm1, &new_rank);
      if(comm2 != MPI_COMM_NULL)
        MPI_Comm_rank(comm2, &new_rank);
    
    
    if(new_rank == 0)
    {
        printf("Reading file - process %d\n", my_rank);
        data = (double*) malloc (N * sizeof(double));
        FILE *f = fopen("input_vec.dat","r");
        for(i = 0; i < N; i++){
          fscanf(f, "%lf\n", &data[i]);
        }
        fclose(f);
        free(f);
    }
    
    local_data = (double*) malloc ((N/(nproc/2))*sizeof(double));
    local_sum = 0;
    mean = 0;
      
    if(comm1 != MPI_COMM_NULL){
      MPI_Scatter(data,N/(nproc/2),MPI_DOUBLE,local_data,N/(nproc/2),MPI_DOUBLE,0,comm1);
    }
    if(comm2 != MPI_COMM_NULL){
      MPI_Scatter(data,N/(nproc/2),MPI_DOUBLE,local_data,N/(nproc/2),MPI_DOUBLE,0,comm2);
    }
    
     if(comm1 != MPI_COMM_NULL)
     {
       local_sum = compute_mean(local_data,N/(nproc/2));
       printf("(comm1) - Local mean for process %d is: %f\n", new_rank, local_sum);
       MPI_Reduce(&local_sum, &mean, nproc/2, MPI_DOUBLE, MPI_SUM, 0, comm1);
     }
    
    MPI_Bcast(&mean,1,MPI_DOUBLE,0,MPI_COMM_WORLD);
    
    if(comm1 != MPI_COMM_NULL && new_rank == 0)
    {
      printf("(comm1) - The mean is: %f\n",mean);
    }
      
    
    if(comm2 != MPI_COMM_NULL)
    {
      local_sum = compute_var(local_data, 0.091529, N/(nproc/2));
      printf("(comm2) - Local variance for process %d is: %f\n", new_rank, local_sum);
      MPI_Reduce(&local_sum,&sigma,nproc/2,MPI_DOUBLE, MPI_SUM, 0, comm2);
    }
    
    
    if(comm2 != MPI_COMM_NULL && new_rank == 0)
    {
      printf("(comm2) - The variance is: %f\n", sigma);
    }
    
    
    free(ranks1);
    free(ranks2);
    if (new_rank == 0) free(data);
    free(local_data);
    //MPI_Comm_free(&comm1);
    //MPI_Comm_free(&comm2);
    
      MPI_Finalize();
      return 0;
    }
    
    double compute_mean(double* v, int count)
    {
      int i;
      double total  = 0;
      for(i = 0; i < count; i++)
        total += v[i];
      total /= N;
      return total;
    }
    
    double compute_var(double* v, double mean, int count)
    {
      int i;
      double total = 0;
      for(i = 0; i < count; i++){
        double diff = v[i] - mean;
        total += diff*diff;
      }
      total /= N-1;
      return total;
    }
    

输出(重新排列以便更容易看到 - 我认为这不重要?):

Reading file - process 0
(comm1) - Local mean for process 0 is: 0.019051
(comm1) - Local mean for process 1 is: 0.021419
(comm1) - Local mean for process 2 is: 0.024029
(comm1) - Local mean for process 3 is: 0.027030
(comm1) - The mean is: 0.072478                         (EDIT ---> correct is 0.091529)

Reading file - process 4
(comm2) - Local variance for process 0 is: 0.000061
(comm2) - Local variance for process 1 is: 0.000011
(comm2) - Local variance for process 2 is: 0.000008
(comm2) - Local variance for process 3 is: 0.000073
(comm2) - The variance is: 0.000092                     (EDIT ---> correct is 0.000153)

[DESKTOP-FVPJMLQ:02619] *** Process received signal ***
[DESKTOP-FVPJMLQ:02619] Signal: Segmentation fault (11)
[DESKTOP-FVPJMLQ:02619] Signal code:  (128)
[DESKTOP-FVPJMLQ:02619] Failing at address: (nil)
[DESKTOP-FVPJMLQ:02629] *** Process received signal ***
[DESKTOP-FVPJMLQ:02629] Signal: Segmentation fault (11)
[DESKTOP-FVPJMLQ:02629] Signal code:  (128)
[DESKTOP-FVPJMLQ:02629] Failing at address: (nil)
[DESKTOP-FVPJMLQ:02619] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7ffe1ad3ef20]
[DESKTOP-FVPJMLQ:02619] [ 1] [DESKTOP-FVPJMLQ:02629] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7fceea13ef20]
[DESKTOP-FVPJMLQ:02629] [ 1] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x3d)/lib/x86_64-linux-gnu/libc.so.6(cfree+0x3d)[0x7ffe1ad9798d]
[DESKTOP-FVPJMLQ:02619] [ 2] ex2(+0x1309)[0x7ffe1b801309]
[DESKTOP-FVPJMLQ:02619] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7ffe1ad21b97]
[DESKTOP-FVPJMLQ:02619] [ 4] ex2(+0xc5a)[0x7ffe1b800c5a]
[DESKTOP-FVPJMLQ:02619] *** End of error message ***
[0x7fceea19798d]
[DESKTOP-FVPJMLQ:02629] [ 2] ex2(+0x12ea)[0x7fceeae012ea]
[DESKTOP-FVPJMLQ:02629] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7fceea121b97]
[DESKTOP-FVPJMLQ:02629] [ 4] ex2(+0xc5a)[0x7fceeae00c5a]
[DESKTOP-FVPJMLQ:02629] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node DESKTOP-FVPJMLQ exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
[DESKTOP-FVPJMLQ:02614] 7 more processes have sent help message help-btl-vader.txt / cma-permission-denied
[DESKTOP-FVPJMLQ:02614] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

使用串行程序,正确的输出将是:

Mean: 0.091529
Variance: 0.000153

为了完整起见,我将附上文件input_vec.dat


正如你所看到的,它在均值和方差上都令我失望,我真的不知道为什么。

标签: cmpi

解决方案


推荐阅读