首页 > 解决方案 > 如何并行化 xgboost 拟合?

问题描述

我正在尝试使用不同参数(例如用于参数调整)来拟合许多 xgboost 模型。需要并行运行它们以减少时间。但是,在运行%dopar%命令时,我收到以下错误:Error in unserialize(socklist[[n]]) : error reading from connection.

下面是一个可重现的例子。它与 xgboost 有关,因为任何其他涉及全局变量的计算都在%dopar%循环中工作。有人能指出这种方法有什么遗漏/错误吗?

#### Load packages
library(xgboost)
library(parallel)
library(foreach)
library(doParallel)

#### Data Sim
n = 1000
X = cbind(runif(n,10,20), runif(n,0,10))
y = 10 + 2*X[,1] + 3*X[,2] + rnorm(n,0,1)

#### Init XGB
train = xgb.DMatrix(data  = X[-((n-10):n),], label = y[-((n-10):n)])
test  = xgb.DMatrix(data  = X[(n-10):n,],    label = y[(n-10):n]) 
watchlist = list(train = train, test = test)

#### Init parallel & run
numCores = detectCores()
cl = parallel::makeCluster(numCores)
doParallel::registerDoParallel(cl)

clusterEvalQ(cl, {
  library(xgboost)
})

pred = foreach(i = 1:10, .packages = c("xgboost")) %dopar% {
  xgb.train(data = train, watchlist = watchlist, max_depth=i, nrounds = 1000, early_stopping_rounds = 10)$best_score
 # if xgb.train is replaced with anything else, e.g. 1+y, it works
} 

stopCluster(cl) 

标签: rparallel-processing

解决方案


正如 HenrikB 在评论中指出的那样,xgb.DMatrix对象不能用于并行化。为了解决这个问题,我们可以将对象置于foreach

#### Load packages
library(xgboost)
library(parallel)
library(foreach)
library(doParallel)
#> Loading required package: iterators

data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')

#### Init parallel & run
numCores = detectCores()
cl = parallel::makeCluster(numCores, setup_strategy = "sequential")
doParallel::registerDoParallel(cl)
  
  
  
  
pred = foreach(i = 1:10, .packages = c("xgboost")) %dopar% {
    # BRING CREATION OF XGB MATRIX INSIDE OF foreach
    dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
    dtest <- xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)
    
    watchlist = list(dtrain = dtrain, dtest = dtest)
    
    param <- list(max_depth = i, eta = 0.01, verbose = 0,
                  objective = "binary:logistic", eval_metric = "auc")
    bst <- xgb.train(param, dtrain, nrounds = 100, watchlist, early_stopping_rounds = 10)
    bst$best_score
    } 

stopCluster(cl) 
pred
#> [[1]]
#> dtest-auc 
#>  0.892138 
#> 
#> [[2]]
#> dtest-auc 
#>  0.987974 
#> 
#> [[3]]
#> dtest-auc 
#>  0.986255 
#> 
#> [[4]]
#> dtest-auc 
#>         1 
#>  ...

基准测试:

由于xgboost.train已经并行化,因此查看线程用于xgboost与用于并行运行调整轮次之间的速度差异可能会很有趣。

为此,我包装了一个函数并对不同的组合进行了基准测试:


tune_par <- function(xgbthread, doparthread) {
  
  data(agaricus.train, package='xgboost')
  data(agaricus.test, package='xgboost')
  
  #### Init parallel & run
  cl = parallel::makeCluster(doparthread, setup_strategy = "sequential")
  doParallel::registerDoParallel(cl)
  
  clusterEvalQ(cl, {
    data(agaricus.train, package='xgboost')
    data(agaricus.test, package='xgboost')
  })
  
  
  
  pred = foreach(i = 1:10, .packages = c("xgboost")) %dopar% {
    dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
    dtest <- xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)
    
    watchlist = list(dtrain = dtrain, dtest = dtest)
    
    param <- list(max_depth = i, eta = 0.01, verbose = 0, nthread = xgbthread,
                  objective = "binary:logistic", eval_metric = "auc")
    bst <- xgb.train(param, dtrain, nrounds = 100, watchlist, early_stopping_rounds = 10)
    bst$best_score
  } 
  
  stopCluster(cl) 
  
  pred
  
}

在我的测试中,当为 xgboost 使用更多线程而在并行运行调整轮次时使用更少的线程时,评估速度更快。最有效的方法可能取决于系统规格和数据量。

# 16 logical cores split between xgb threads and threads in dopar cluster:
microbenchmark::microbenchmark(
  xgb16par1 = tune_par(xgbthread = 16, doparthread = 1),
  xgb8par2 = tune_par(xgbthread = 8, doparthread = 2),
  xgb4par4 = tune_par(xgbthread = 4,doparthread = 4),
  xgb2par8 = tune_par(xgbthread = 2, doparthread = 8),
  xgb1par16 = tune_par(xgbthread = 1,doparthread = 16),
  times = 5
)
#> Unit: seconds
#>       expr      min       lq     mean   median       uq      max neval  cld
#>  xgb16par1 2.295529 2.431110 2.500170 2.519277 2.527914 2.727021     5 a   
#>   xgb8par2 2.301189 2.308377 2.407767 2.363422 2.465446 2.600402     5 a   
#>   xgb4par4 2.632711 2.778304 2.875816 2.825471 2.849003 3.293593     5  b  
#>   xgb2par8 4.508485 4.682284 4.752776 4.810461 4.822566 4.940085     5   c 
#>  xgb1par16 8.493378 8.550609 8.679931 8.768008 8.779718 8.807943     5    d

推荐阅读