python - 如何跨多个节点使用 mpi4py 发送数据?
问题描述
大家:我在hpc上运行我的代码,但是我不能在节点之间传输数据。我编写了一个简单的代码来测试跨节点的内核之间的通信。首先,我使用一个节点8核,我的代码是(test.py
)
from mpi4py import MPI
import sys
import numpy as np
def print_hello(rank, size, name):
msg = "Hello World! I am process {0} of {1} on {2}.\n"
sys.stdout.write(msg.format(rank, size, name))
if __name__ == "__main__":
comm = MPI.COMM_WORLD
size = MPI.COMM_WORLD.Get_size()
rank = MPI.COMM_WORLD.Get_rank()
name = MPI.Get_processor_name()
if rank == 0:
data = np.random.random((8,64,64,64))
print(data.shape)
else:
data = None
data = comm.scatter(data,root=0)
print(data.shape)
print_hello(rank, size, name)
我使用srun -N 1 -n 8 python3 test.py 2>&1 | tee out.txt
它运行它,就像mpirun -np 8 python3 test.py 1>&1 | tee out.txt
它只运行 5 秒out.txt
文件是:
(64, 64, 64)
Hello World! I am process 4 of 8 on cn3478.
(64, 64, 64)
Hello World! I am process 5 of 8 on cn3478.
(64, 64, 64)
Hello World! I am process 6 of 8 on cn3478.
(64, 64, 64)
Hello World! I am process 7 of 8 on cn3478.
(64, 64, 64)
Hello World! I am process 1 of 8 on cn3478.
(64, 64, 64)
Hello World! I am process 2 of 8 on cn3478.
(64, 64, 64)
Hello World! I am process 3 of 8 on cn3478.
(8, 64, 64, 64)
(64, 64, 64)
Hello World! I am process 0 of 8 on cn3478.
一切看起来都不错!但是,当我使用两个节点 48 核时,就出错了!该文件是(test48.py
):
from mpi4py import MPI
import sys
import numpy as np
def print_hello(rank, size, name):
msg = "Hello World! I am process {0} of {1} on {2}.\n"
sys.stdout.write(msg.format(rank, size, name))
if __name__ == "__main__":
comm = MPI.COMM_WORLD
size = MPI.COMM_WORLD.Get_size()
rank = MPI.COMM_WORLD.Get_rank()
name = MPI.Get_processor_name()
if rank == 0:
data = np.random.random((48,64,64,64))
print(data.shape)
else:
data = None
data = comm.scatter(data,root=0)
print(data.shape)
print_hello(rank, size, name)
我运行yhrun -N 2 -n 48 python3 test48.py 2>&1 | tee out2.txt
它运行了很多时间(超过 1 小时)并且没有打印任何东西。我猜是数据传输出错了,因为我注释掉了这两行:
data = comm.scatter(data,root=0)
print(data.shape)
代码很快完成,输出为:
Hello World! I am process 25 of 48 on cn3598.
Hello World! I am process 28 of 48 on cn3598.
Hello World! I am process 30 of 48 on cn3598.
Hello World! I am process 31 of 48 on cn3598.
Hello World! I am process 32 of 48 on cn3598.
Hello World! I am process 40 of 48 on cn3598.
Hello World! I am process 41 of 48 on cn3598.
Hello World! I am process 44 of 48 on cn3598.
Hello World! I am process 24 of 48 on cn3598.
Hello World! I am process 26 of 48 on cn3598.
Hello World! I am process 27 of 48 on cn3598.
Hello World! I am process 29 of 48 on cn3598.
Hello World! I am process 33 of 48 on cn3598.
Hello World! I am process 34 of 48 on cn3598.
Hello World! I am process 35 of 48 on cn3598.
Hello World! I am process 36 of 48 on cn3598.
Hello World! I am process 37 of 48 on cn3598.
Hello World! I am process 38 of 48 on cn3598.
Hello World! I am process 42 of 48 on cn3598.
Hello World! I am process 43 of 48 on cn3598.
Hello World! I am process 45 of 48 on cn3598.
Hello World! I am process 46 of 48 on cn3598.
Hello World! I am process 47 of 48 on cn3598.
Hello World! I am process 39 of 48 on cn3598.
Hello World! I am process 1 of 48 on cn3597.
Hello World! I am process 3 of 48 on cn3597.
Hello World! I am process 4 of 48 on cn3597.
Hello World! I am process 9 of 48 on cn3597.
Hello World! I am process 12 of 48 on cn3597.
Hello World! I am process 15 of 48 on cn3597.
Hello World! I am process 16 of 48 on cn3597.
Hello World! I am process 17 of 48 on cn3597.
Hello World! I am process 20 of 48 on cn3597.
Hello World! I am process 2 of 48 on cn3597.
Hello World! I am process 5 of 48 on cn3597.
Hello World! I am process 6 of 48 on cn3597.
Hello World! I am process 7 of 48 on cn3597.
Hello World! I am process 8 of 48 on cn3597.
Hello World! I am process 10 of 48 on cn3597.
Hello World! I am process 11 of 48 on cn3597.
Hello World! I am process 13 of 48 on cn3597.
Hello World! I am process 14 of 48 on cn3597.
Hello World! I am process 18 of 48 on cn3597.
Hello World! I am process 19 of 48 on cn3597.
Hello World! I am process 21 of 48 on cn3597.
Hello World! I am process 22 of 48 on cn3597.
Hello World! I am process 23 of 48 on cn3597.
(48, 64, 64, 64)
Hello World! I am process 0 of 48 on cn3597.
代码有什么问题?还是不允许节点之间的数据传输?欢迎任何建议!
解决方案
推荐阅读
- sql-server - CONCAT 函数在兼容级别为 100 (2008) 的 SQL Server 数据库上工作
- scala - 将从可变映射中删除的元素收集到第二个可变映射中的惯用方法
- linux - 健康的线程和任务数量?
- javascript - 王牌编辑器 - 传递模式以形成输入
- arrays - 使用 Golang 修改 xml 文件中的数据
- javascript - Array.from 与 map 如何产生密钥
- jquery - 以增量方式隐藏其他 div
- android - Gradle 构建失败:找不到方法 'org.gradle.api.tasks.testing.Test.getTestClassesDirs()Lorg/gradle/api/file/FileCollection;'
- c# - 在自定义队列包装器中实现平均方法
- html - 我可以让设置为 justify-content: space-between 的元素在换行时向右对齐吗?