首页 > 解决方案 > CUML:无法在多 GPU Dask 集群上训练随机森林模型

问题描述

基于官方的分布式模型训练示例(https://github.com/rapidsai/cuml/blob/branch-0.18/notebooks/random_forest_mnmg_demo.ipynb),我使用 Iris 数据集在多 GPU 桌面上训练随机森林模型集群(一个调度程序节点,三个工作节点),但无法训练模型。结果如下:

CuML accuracy:   0.36666666666666664
Exception ignored in: <object repr() failed>
Traceback (most recent call last):
  File "cuml/ensemble/randomforestclassifier.pyx", line 334, in cuml.ensemble.randomforestclassifier.RandomForestClassifier.__del__
  File "cuml/ensemble/randomforestclassifier.pyx", line 350, in cuml.ensemble.randomforestclassifier.RandomForestClassifier._reset_forest_data
AttributeError: 'NoneType' object has no attribute 'free_treelite_model'

Process finished with exit code 0

我的环境是由 conda 命令构建的:

conda create -n rapids-0.18 -c rapidsai -c nvidia -c conda-forge \
    -c defaults rapids-blazing=0.18 python=3.8 cudatoolkit=10.2

我用于 RAPIDs RandomForestClassifier 的代码是:

import pandas as pd
import cudf
import cuml
from cuml import train_test_split
from cuml.metrics import accuracy_score
from dask.distributed import Client, wait
from dask_cuda import LocalCUDACluster
import dask_cudf
from cuml.dask.ensemble import RandomForestClassifier as cumlDaskRF

# start dask cluster
c = Client('node0:8786')

# Query the client for all connected workers
workers = c.has_what().keys()
n_workers = len(workers)
n_streams = 8  # Performance optimization

# Random Forest building parameters
max_depth = 12
n_bins = 16
n_trees = 1000

# Read data
pdf = pd.read_csv('/data/iris.csv',header = 0, delimiter = ',') # Get complete CSV
cdf = cudf.from_pandas(pdf) # Get cuda dataframe
features = cdf.iloc[:, [0, 1, 2, 3]].astype('float32') # Get data columns
labels = cdf.iloc[:, 4].astype('category').cat.codes.astype('int32') # Get label column

# Split train and test data
X_train, X_test, y_train, y_test = train_test_split(feature, label, train_size=0.8, shuffle=True)

# Distribute data to worker GPUs
n_partitions = n_workers
X_train_dask = dask_cudf.from_cudf(X_train, npartitions=n_partitions)
X_test_dask = dask_cudf.from_cudf(X_test, npartitions=n_partitions)
y_train_dask = dask_cudf.from_cudf(y_train, npartitions=n_partitions)

# Train the distributed cuML model
cuml_model = cumlDaskRF(max_depth=max_depth, n_estimators=n_trees, n_bins=n_bins, n_streams=n_streams)
cuml_model.fit(X_train_dask, y_train_dask)

wait(cuml_model.rfs)  # Allow asynchronous training tasks to finish

# Predict and check accuracy
cuml_y_pred = cuml_model.predict(X_test_dask).compute().to_array()
print("CuML accuracy:  ", accuracy_score(y_test.to_array(), cuml_y_pred))

使用 LocalCUDACluster 并没有改变结果。

你能指出我的错误并给我正确的代码吗?如果我想在训练有素的随机森林模型上评估决策树,我怎样才能得到那些训练有素的决策树?

谢谢你。

标签: pythonrandom-forestdask-distributedrapids

解决方案


推荐阅读