首页 > 解决方案 > 节点数大于3时MPI程序不执行

问题描述

设置节点数为3,执行以下命令后程序正常运行:

[changmx@gpu02 mpiTest]$ mpiexec -n 4 -host gpu02,gpu03,gpu04 helloworld
[gpu04:16537] [[37424,0],2] remote spawn is NULL!
[gpu03:01562] [[37424,0],1] remote spawn is NULL!
Hello World! Process 1 of 4 on gpu02
Hello World! Process 3 of 4 on gpu04
Hello World! Process 0 of 4 on gpu02
Hello World! Process 2 of 4 on gpu03

[changmx@gpu02 mpiTest]$ mpiexec -n 4 -host gpu02,gpu03,gpu05 helloworld
[gpu03:01597] [[37381,0],1] remote spawn is NULL!
[gpu05:26312] [[37381,0],2] remote spawn is NULL!
Hello World! Process 0 of 4 on gpu02
Hello World! Process 1 of 4 on gpu02
Hello World! Process 2 of 4 on gpu03
Hello World! Process 3 of 4 on gpu05

但是当节点数为 4 时,程序既不会执行也不会退出,除非我按 Ctrl C 退出:

[changmx@gpu02 mpiTest]$ mpiexec -n 4 -host gpu02,gpu03,gpu04,gpu05 helloworld
[gpu04:16671] [[37833,0],2] remote spawn is NULL!
[gpu03:01731] [[37833,0],1] remote spawn is NULL!

下面是我的源代码:

#include <stdio.h>
#include <string.h>
#include <math.h>

#include <mpi.h>

#include <cuda_runtime.h>
#include <device_launch_parameters.h>

int main(int argc, char *argv[])
{
    int myrank, numprocs;
    int namelen = 20;
    char process_name[namelen];

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
    MPI_Comm_size(MPI_COMM_WORLD, &numprocs);

    MPI_Get_processor_name(process_name, &namelen);

    printf("Hello World! Process %d of %d on %s\n", myrank, numprocs, process_name);

    MPI_Finalize();
}

我的 Open MPI 版本是 1.8.8。

标签: c++cmpiopenmpi

解决方案


这个问题应该是我的Open MPI安装不正确造成的。当我更改 Open MPI 版本时,此问题不再出现。


推荐阅读