首页 > 解决方案 > 错误:DMatrix 参数 `enable_categorical` 必须设置为 True

问题描述

我在包含整数、浮点数和对象的数据集上训练了 XGB。现在,我想使用 Shap Values 构建特征重要性。但是,我遇到一个错误,指出:

ValueError: DataFrame.dtypes for data must be int, float, bool or category.  When
categorical type is supplied, DMatrix parameter `enable_categorical` must be 
set to `True`

这是我的代码:

from sklearn.pipeline import Pipeline
from xgboost import XGBClassifier
import shap

model = XGBClassifier()
pipeline = Pipeline([
    ("preprocessing", preprocessing_pipeline),
    ("classifier", model)
])

trained_pipeline = pipeline.fit(X_train, y_train)
y_pred_test = trained_pipeline.predict_proba(X_test)[:, 1]
y_pred_train = trained_pipeline.predict_proba(X_train)[:, 1]

#evaluate model on test set
X_eval = X_test.copy()
X_eval.insert(0, 'TARGET', y_test)
X_eval.insert(1, 'PREDICTION', y_pred_test)
X_eval.insert(2, 'ACCURATE', X_eval["TARGET"] == (X_eval["PREDICTION"] > 0.5))
X_eval = X_eval.reset_index()

#plot feature importance using Shap values
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_eval)

标签: python-3.xmachine-learningxgboost

解决方案


推荐阅读