首页 > 解决方案 > 在远程集群中为 joblib 使用 dask 后端

问题描述

我们如何使用 dask 在远程集群节点(例如 PSC 网桥)上进行并行计算。举个例子:

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegressionCV
from sklearn.model_selection import train_test_split
from task.distributed import Client
from joblib import Parallel, delayed, parallel_backend
import numpy as np

client = Client()             # create local cluster   LINE-1
# client = Client(processes=False)             # create local cluster LINE-2
# client = Client("scheduler-address:8786")  # or connect to remote cluster

def get_acc(X, y, i):

   X_ =X[i]
   X_train, X_test, y_train, y_test = train_test_split(
       X_, y, test_size=0.1, random_state=42)

   clf = LogisticRegressionCV(
           n_jobs=5,
           cv = 5,
           max_iter=1000,
           solver='liblinear',
           penalty='l1').fit(
           X_train, y_train)
   return clf.score(X_test, y_test)

X, y = make_classification(n_samples=1000, n_features=10000, n_classes=2)
X = np.repeat(X[None,:,:], 150, axis=0)
num_cores = 30
with parallel_backend('dask'):
   scores = Parallel(n_jobs=num_cores, verbose=100)(
               delayed(get_acc)
               (X, y, i)
               for i in range(25)
           )

在使用 LINE-1 创建本地集群时,节点内存 (512GB) 已满,并且不会启动任何计算。在使用第二行 (LINE-2) 时,我得到“OSError: timed out connected to impproc://10.8.10.235/791270/1 after 10s”

但是这个 dask parallel 在我的笔记本电脑上运行得非常好。

标签: daskdask-distributedjoblib

解决方案


我不明白这在你的笔记本电脑上是如何工作的。在这

from task.distributed import Client

task应该是dask


推荐阅读