首页 > 解决方案 > AWS 多语言环境安装似乎不正确

问题描述

我在运行 CentOS 7 的 AWS 上设置了一个 12 个 EC2 实例“集群”。这些节点共享一个公共 NFS 文件系统,并在其主目录所在的位置具有单独的引导卷。我在 NFS 文件系统上安装了 Chapel 以供多语言环境使用。如果有帮助,我可以分享安装步骤。Gmake 似乎运行没有错误,但 gmake check 没有给出无错误的输出。此外,如果 Chapel 程序位于本地文件系统上,我将无法运行多语言环境示例。那是对的吗?

02/03/2021 我的错。刚刚在文档中注意到了这一点。“并将编译后的二进制文件复制到同一路径下的所有 EC2 实例上。” 但仍然试图找出 gmake 检查失败的原因。

安装后,如果我设置 GASNET_SPAWNFN=L,gmake check 会给出以下输出:

.
.
.
Hello, world! (from locale 0 of 4)
Hello, world! (from locale 3 of 4)
Hello, world! (from locale 2 of 4)
Hello, world! (from locale 1 of 4)
GASNET: Exiting after AMUDP_SPMDExit(0)...
gmake: *** [Makefile:209: check] Error 20

如果我设置 GASNET_SPAWNFN=S,gmake check 永远不会返回。gmake check --debug=v 给出以下输出:

.
.
.
updating makefiles....
Updating goal targets....
Considering target file 'all'.
 File 'all' does not exist.
  Considering target file 'FORCE'.
   File 'FORCE' does not exist.
   Finished prerequisites of target file 'FORCE'.
  Must remake target 'FORCE'.
  Successfully remade target file 'FORCE'.
 Finished prerequisites of target file 'all'.
Must remake target 'all'.
Successfully remade target file 'all'.
    Successfully remade target file '/tmp/chpl-centos-11886.deleteme/hello6-taskpar-dist.tmp'.
   Finished prerequisites of target file 'all'.
  Must remake target 'all'.
  Successfully remade target file 'all'.
 Finished prerequisites of target file 'default'.
Must remake target 'default'.
Successfully remade target file 'default'.
gmake: *** [Makefile:209: check] Error 1

如果我在共享 NFS 文件系统 (hpl -o hello6 $CHPL_HOME/examples/hello6-taskpar-dist.chpl) 上编译示例/hello6-taskpar-dist.chpl 并运行 ./hello6 -nl 12,它会返回:

.
.
.
Hello, world! (from locale 0 of 12 named ip-10-xxx-yy-311.evoforge.org)
Hello, world! (from locale 7 of 12 named ip-10-xxx-yy-348.evoforge.org)
Hello, world! (from locale 10 of 12 named ip-10-xxx-yy-362.evoforge.org)
Hello, world! (from locale 9 of 12 named ip-10-xxx-yy-322.evoforge.org)
Hello, world! (from locale 8 of 12 named ip-10-xxx-yy-316.evoforge.org)
Hello, world! (from locale 6 of 12 named ip-10-xxx-yy-331.evoforge.org)
Hello, world! (from locale 1 of 12 named ip-10-xxx-yy-335.evoforge.org)
Hello, world! (from locale 3 of 12 named ip-10-xxx-yy-353.evoforge.org)
Hello, world! (from locale 5 of 12 named ip-10-xxx-yy-317.evoforge.org)
Hello, world! (from locale 11 of 12 named ip-10-xxx-yy-358.evoforge.org)
Hello, world! (from locale 4 of 12 named ip-10-xxx-yy-364.evoforge.org)
Hello, world! (from locale 2 of 12 named ip-10-xxx-yy-344.evoforge.org)
GASNET: Exiting after AMUDP_SPMDExit(0)...

(被我混淆的ip号码)

这似乎是正确的。

但是,如果我在本地主目录中运行 compile 并运行 ./hello6 -nl 12 -v 它会挂在这里:

.
.
.
bash: line 0: cd: /home/centos/chapel: No such file or directory
env: /home/centos/chapel/hello6_real: No such file or directory
env: /home/centos/chapel/hello6_real: No such file or directory
GASNET: slave connecting to 10.xxx.yy.311:43233
ENV parameter: GASNET_LINEBUFFERSZ = 1024                       (default)
GASNET: slave using IP 10.xxx.yy.311

这是 printchplev 的输出:

$ util/printchplenv
machine info: Linux ip-10-xxx-xx-311.evoforge.org 3.10.0-1160.11.1.el7.x86_64 #1 SMP Fri Dec 18 16:34:56 UTC 2020 x86_64
CHPL_HOME: /nfs/software/chapel-1.23.0 *
script location: /nfs/software/chapel-1.23.0/util/chplenv
CHPL_TARGET_PLATFORM: linux64
CHPL_TARGET_COMPILER: gnu
CHPL_TARGET_ARCH: x86_64
CHPL_TARGET_CPU: unknown
CHPL_LOCALE_MODEL: flat
CHPL_COMM: gasnet *
  CHPL_COMM_SUBSTRATE: udp
  CHPL_GASNET_SEGMENT: everything
CHPL_TASKS: qthreads
CHPL_LAUNCHER: amudprun
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_MEM: jemalloc
CHPL_ATOMICS: cstdlib
  CHPL_NETWORK_ATOMICS: none
CHPL_GMP: gmp
CHPL_HWLOC: hwloc
CHPL_REGEXP: re2
CHPL_LLVM: none
CHPL_AUX_FILESYS: none

如果我可以提供任何其他信息,请告诉我。谢谢您的帮助。

02/03/2021 以下是一些附加信息。这是我遵循的最终构建过程。

为 CentOS 7 安装了列出的先决条件。列出了版本。没有安装 llvm,因为它没有将其列为先决条件。

chapel - 1.23.0
gcc - 8.3.1 20190311
m4 - (GNU M4) 1.4.16
perl - v5.16.3
python - 2.7.5
bash - GNU bash, version 4.2.46(2)
gmake - GNU Make 4.2.1
gawk - GNU Awk 4.0.2

在每个节点上的 .bashrc 中添加了以下行(IP 地址被混淆):

export CHPL_HOME=/nfs/software/chapel-1.23.0
source /nfs/software/chapel-1.23.0/util/setchplenv.bash
export CHPL_COMM=gasnet
export GASNET_SPAWNFN=S
export GASNET_SSH_SERVERS="10.xxx.yy.311 10.xxx.yy.316 10.xxx.yy.317 10.xxx.yy.322 10.xxx.yy.331 10.xxx.yy.335 10.xxx.yy.344 10.xxx.yy.348 10.xxx.yy.353 10.xxx.yy.358 10.xxx.yy.362 10.xxx.yy.364"

通过在每个节点上生成 RSA 密钥对 (ssh-keygen -t rsa) 在节点之间启用无密码 ssh。将所有节点上生成的公钥复制到每个节点上的 .ssh/authorized_keys 文件中。

然后建造它。它第一次抱怨我应该使用 gmake 而不是 make。

cd chapel-1.23.0
gmake
gmake check

再次,提前感谢您的帮助。

标签: linuxamazon-web-serviceschapel

解决方案


默认情况下,make check假设$HOME 是跨节点共享的,这是大多数 HPC 系统的情况。在测试期间,它会创建一个临时目录作为已编译测试程序的目的地:$HOME/.chpl. 因为您的主目录未挂载 NFS,所以make check失败。

您可以make check通过设置覆盖临时目录CHPL_CHECK_INSTALL_DIR。如果将该环境变量指向 NFS 安装路径,make check则应该可以。


推荐阅读