首页 > 解决方案 > MPI_Scatterv (c) 会给出分段错误

问题描述

我已经构建了一个相当简单的 c 代码,它读取 pgm 图像,将其拆分为不同的部分并将其发送到各个核心以进行详细说明。

为了考虑一些细化余量(每个核心必须访问比它需要写入更大的图像区域),我不能简单地分割图像,但我首先必须创建一个数组,在其中添加之前提到的边距。

举个简单的例子:图像是 1600x1200(宽 x 高),我有 2 个核心,我想访问以像素为中心的 3x3 区域,我将这个图像水平线一条水平线分割,然后细分将是 - > 第一个核心获取 0 到 601 1600 的像素,第二个核心获取 509 1600 到 1200*1600 的像素。

现在,我相信我在程序中实现它的方式没有任何问题,但我仍然收到此错误:

[ct1pt-tnode003:22389:0:22389] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x7ffe7f60ead8)
==== backtrace (tid:  22389) ====
 0 0x000000000004ee05 ucs_debug_print_backtrace()  ???:0
 1 0x0000000000402624 main()  ???:0
 2 0x0000000000022505 __libc_start_main()  ???:0
 3 0x0000000000400d99 _start()  ???:0

这是我的代码:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <math.h>
#include <time.h>
#include "testlibscatter.h"
#include <mpi.h>

#define MSGLEN 2048


int main(int argc, char *argv[]){

MPI_Init(&argc, &argv);

int m = atoi(argv[1]), n = atoi(argv[2]), kern_type = atoi(argv[3]);
double kernel[m*n];
int i_rank, ranks;
int param, symm;

MPI_Comm_rank( MPI_COMM_WORLD, &i_rank);
MPI_Comm_size( MPI_COMM_WORLD, &ranks);

int xsize, ysize, maxval;
xsize = 0;
ysize = 0;
maxval = 0;

void * ptr;

switch (kern_type){
    case 1:
    meankernel(m, n, kernel);
    break;
    case 2:
    weightkernel(m, n, param, kernel);
    break;
    case 3:
    gaussiankernel(m, n, param, symm, kernel);
    break;
}

if (i_rank == 0){
    read_pgm_image(&ptr, &maxval, &xsize, &ysize, "check_me2.pgm");
}


MPI_Bcast(&xsize, 1, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Bcast(&ysize, 1, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Bcast(&maxval, 1, MPI_INT, 0, MPI_COMM_WORLD);

int flo, start, end, i;
flo = floor(ysize/ranks);

int first, last;

first = start - (m - 1)/2;
last = end + (m - 1)/2;

if (start == 0){
    first = 0;
}
if (end == ysize){
    last = ysize;
}

int sendcounts[ranks];
int displs[ranks];

int first2[ranks];
int last2[ranks];
int c_start2[ranks];
int c_end2[ranks];

int num;
num = (ranks - 1) * (m-1);
printf("num is %d\n", num);

unsigned short int bigpic[xsize*(ysize + num)];


if (i_rank == 0){
    for(i = 0; i < ranks; i++){
        c_start2[i] = i * flo;
        c_end2[i] = (i + 1) * flo; 
        if ( i == ranks - 1){
            c_end2[i] = ysize;
        }
        first2[i] = c_start2[i] - (m - 1)/2;
        last2[i] = c_end2[i] + (m - 1)/2;
        if (c_start2[i] == 0){
            first2[i] = 0;
        }
        if (c_end2[i] == ysize){
            last2[i] = ysize;
        }
        sendcounts[i] = (last2[i] - first2[i]) * xsize; 
    }

    int i, j, k, index, index_disp = 0;
    index = 0;
    displs[0] = 0;

    for (k = 0; k < ranks; k++){
        for (i = first2[k]*xsize; i < last2[k]*xsize; i++){
            bigpic[index] = ((unsigned short int *)ptr)[i];
            index++;
        }
        printf("%d\n", displs[index_disp]);
        index_disp++;
        displs[index_disp] = index;
    }

}

MPI_Bcast(displs, ranks, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Bcast(sendcounts, ranks, MPI_INT, 0, MPI_COMM_WORLD);

unsigned short int minipic[xsize*(last-first)];
MPI_Barrier(MPI_COMM_WORLD);
MPI_Scatterv(&bigpic[0], sendcounts, displs, MPI_UNSIGNED_SHORT, minipic, (last-first)*xsize, MPI_UNSIGNED_SHORT, 0, MPI_COMM_WORLD);

MPI_Finalize();
}

函数内核只返回一个 m*n 双精度数组来编辑图像,而 read_pgm_image 返回一个带有读取图像值的 void 指针。我试过打印的值,bigpic它们没有问题。

标签: c++cparallel-processingsegmentation-faultmpi

解决方案


在此处显示的代码中,startandend未初始化用于计算firstand last

int flo, start, end, i;
         ~~~~~~~~~~
flo = floor(ysize/ranks);

int first, last;

first = start - (m - 1)/2; // <---- start has a random value here
last = end + (m - 1)/2;    // <---- end has a random value here

如果值非常大,则 的大小minipic可能会大于堆栈大小:

unsigned short int minipic[xsize*(last-first)];
                                  ^^^^^^^^^^ random (possibly large) value

一个强烈的迹象表明这确实是原因,错误的地址0x7ffe7f60ead8非常接近虚拟地址空间的正部分的末尾,这是大多数 64 位操作系统分配主线程堆栈区域的地方.

始终编​​译-Wall,以便从编译器中获取尽可能多的诊断消息。


推荐阅读