首页 > 解决方案 > 如何让 R 脚本在 moix 集群上并行运行?

问题描述

我正在尝试重新创建本文第 3 部分中给出的示例,该示例跨集群管理的多个实例执行简单计算。主要计算发生在这个脚本“sim.R”中:

# sim.R
# If the "batch" package has not been installed, run the line below:
# install.packages("batch", repos = "http://cran.cnr.Berkeley.edu")
seed <- 1000
n <- 50
nsim <- 10000
mu <- c(0, 0.5)
sd <- c(1, 1)
library("batch")

parseCommandArgs()
set.seed(seed)
pvalue <- rep(0,nsim)

for(i in 1:nsim) {
        X <- rnorm(n = n, mean = mu[1], sd = sd[1])
        Y <- rnorm(n = n, mean = mu[2], sd = sd[2])
        pvalue[i] <- t.test(X, Y)$p.value
}
power <- mean(pvalue <= 0.05)

out <- data.frame(seed = seed, nsim = nsim, n = n,
        mu = paste(mu, collapse = ","),
        sd = paste(sd, collapse = ","), power = power)
outfilename <- paste("res", seed, ".csv", sep = "")
print(out)
write.csv(out, outfilename, row.names = FALSE)

要运行 sim.R 的多个并行实例,还有另一个脚本“param-sim.R”

library("batch")
seed <- 1000
for(i in 1:10) {
        seed <- rbatch("sim.R", seed = seed, n = 25, mu = c(0, i / 10))
        rbatch.local.run() # My understanding from the linked paper is that this line will do nothing if the script is run on a mosix cluster and not locally.
}

要在 moix 集群上运行它,我从终端使用以下命令:

R --vanilla --args RBATCH mosix < param-sim.R

我希望这个输出生成 10 个 .csv 文件,标记为 res1000.csv - res1009.csv。相反,这是我得到的(我在 Ubuntu 环境中运行此命令):

$ R --vanilla --args RBATCH mosix < param-sim.R

R version 3.4.4 (2018-03-15) -- "Someone to Lean On"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library("batch")
> seed <- 1000
> for(i in 1:10) {
+   seed <- rbatch("sim.R", seed = seed, n = 25, mu = c(0, i / 10))
+   rbatch.local.run()
+ }
nohup mosrun -e -b -q R --vanilla --args  seed 1000 n 25 mu "c(0,0.1)" < sim.R > sim.Rout1000 & 
rbatch.local.run: no commands have been batched.
nohup: redirecting stderr to stdout
nohup mosrun -e -b -q R --vanilla --args  seed 1001 n 25 mu "c(0,0.2)" < sim.R > sim.Rout1001 & 
nohup: redirecting stderr to stdout
rbatch.local.run: no commands have been batched.
nohup mosrun -e -b -q R --vanilla --args  seed 1002 n 25 mu "c(0,0.3)" < sim.R > sim.Rout1002 & 
rbatch.local.run: no commands have been batched.
nohup mosrun -e -b -q R --vanilla --args  seed 1003 n 25 mu "c(0,0.4)" < sim.R > sim.Rout1003 & nohup: 
redirecting stderr to stdout
rbatch.local.run: no commands have been batched.
nohup mosrun -e -b -q R --vanilla --args  seed 1004 n 25 mu "c(0,0.5)" < sim.R > sim.Rout1004 & 
nohup: redirecting stderr to stdout
rbatch.local.run: no commands have been batched.
nohup mosrun -e -b -q R --vanilla --args  seed 1005 n 25 mu "c(0,0.6)" < sim.R > sim.Rout1005 & 
nohup: redirecting stderr to stdout
rbatch.local.run: no commands have been batched.
nohup mosrun -e -b -q R --vanilla --args  seed 1006 n 25 mu "c(0,0.7)" < sim.R > sim.Rout1006 & 
nohup: redirecting stderr to stdout
rbatch.local.run: no commands have been batched.
nohup mosrun -e -b -q R --vanilla --args  seed 1007 n 25 mu "c(0,0.8)" < sim.R > sim.Rout1007 & 
nohup: redirecting stderr to stdout
rbatch.local.run: no commands have been batched.
nohup mosrun -e -b -q R --vanilla --args  seed 1008 n 25 mu "c(0,0.9)" < sim.R > sim.Rout1008 & 
nohup: redirecting stderr to stdout
rbatch.local.run: no commands have been batched.
nohup mosrun -e -b -q R --vanilla --args  seed 1009 n 25 mu "c(0,1)" < sim.R > sim.Rout1009 & 
nohup: redirecting stderr to stdout
rbatch.local.run: no commands have been batched.
> 
nohup: redirecting stderr to stdout

不会生成 .csv 文件,并且每个输出文件(即 sim.Rout1000)都包含相同的信息:

mosrun - MOSIX Version 4.3.4
Usage: mosrun [location-options] [program-options] {program} [args]...
       mosrun -S{maxjobs} [location-options] [program-options]
                                                {commands-file}[,{failed-file}]
       mosrun -R{filename} [-O{fd=filename}][,{fd2=fn2}]... [location-options]

       mosrun -I{filename}

  Location options - Node specification:
        -b                      try to start on 'best' available node
        -r{hostname}            start on given host
        -{a.b.c.d}              start on the node of given IP address
        -{n}                    start on given logical node number
        -h                      start on home node
  Other location options:
        -F                      do not fail if requested node is not available
        -L                      lock, disallow automatic migration
        -l                      unlock, allowing automatic migration
        -g                      disallow automatic freezing
        -G                      allow automatic freezing
        -m{mb}                  try to run only on nodes with >= mb free memory
        -A {minutes}            auto checkpoint interval in minutes (0-10000000)
        -N {max}                max. # of checkpoints before cycle (0-10000000)
  Program options:
        -e                      unsupported system calls produce -1/errno=ENOSYS
        -w                      as -e, but print warnings for unsupported calls
        -u                      unsupported system calls kill mosrun (default)
        -d {0-10000}            specify decay rate per second in parts of 10000
        -c                      consider program as a pure CPU job (ignore I/O)
        -n                      reverse '-c', so to include I/O considerations
        -C{filename}            test given checkpoint file
        -X{/directory}          declare private directory
        -z                      program arguments start at argument #0 (not #1)

这让我认为该程序从未运行或进入集群队列。我还用“top”命令检查了系统进程,什么也没发现。作为记录,我已经能够在 moix 集群上成功运行简单的 C++ 程序。

我是否错过了允许该程序运行的关键细节?

标签: rmosix

解决方案


推荐阅读