python - XGBoost超参数ValueError的优化：标签必须由形式为0、1、2、...、[num_class

问题描述

我正在为 xgboost 模型做一些基本的超参数优化，并遇到了以下问题。首先我的代码：

from sklearn.preprocessing import LabelEncoder, OrdinalEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, accuracy_score
import xgboost as xgb
from functools import partial
from skopt import space, gp_minimize

<Some preprocessing...> 

x = Oe.fit_transform(x)
y = Ly.fit_transform(y)


def optimize(params, param_names, x, y):
    params = dict(zip(params, param_names))
    X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
    nc = len(set(y_train))
    xgb_model = xgb.XGBClassifier(use_label_encoder=False, num_class=nc+1, objective="multi:softprob", **params)
    xgb_model.fit(X_train, y_train)
    preds = xgb_model.predict(X_test)
    acc = accuracy_score(y_test, preds)
    return -1.0 * acc


param_space = [
    space.Integer(3, 10, name="max_depth"),
    space.Real(0.01, 0.3, prior="uniform", name="learning_rate"),
]

param_names = [
    "max_depth",
    "learning_rate"
]

optimization_function = partial(
    optimize,
    param_names,
    x=x,
    y=y
)

result = gp_minimize(
    optimization_function,
    dimensions=param_space,
    n_calls=30,
    n_random_starts=6,
    verbose=True
)

print(dict(zip(param_names, result.x)))

在自己进行了一些搜索之后，我意识到如果我不在random_state我的火车测试拆分中使用 a 来获得确定性的结果，那么我就有可能得到一个y_train不包含 0,1,2 形式的标签的 a 。 ..因此得到以下错误 ValueError: The label must consist of integer labels of form 0, 1, 2, ..., [num_class - 1].

另一方面，如果我使用随机状态，那么我在此处使用的优化实现将失去其目的，因为考虑到我使用的是小型数据集，我将始终得到相同的结果。

事实上，在运行我的代码之后random_state=0，例如，在 gp_minimize 的 3 次迭代之后，无论它产生何种超参数组合，我最终都会得到相同的最优值。

更新：有人可能会争辩说，即使我选择了不同的随机状态，我得到的最佳组合也取决于那组随机状态，所以最后我只想知道这是否是优化我的正确方法模型。

标签： pythonxgboostrandom-seedhyperparameters

python - XGBoost超参数ValueError的优化：标签必须由形式为0、1、2、...、[num_class - 1]的整数标签组成

问题描述

解决方案

推荐阅读