首页 > 解决方案 > 一个 Hot Encoder 无法对分类数据类型进行编码

问题描述

这是我申请 One Hot Encoder 的代码。但是, cat0 -cat9 列无法解码为数字。

# Import library
import numpy as np
import pandas as pd
from sklearn.compose import make_column_selector as selector
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.compose import make_column_transformer 

# Read data
df = pd.read_csv("../ml_analysis/data.csv")

# Seperate X and y variables
X = df.drop(columns='target')
y = df['target']

# Prepare numerical and categorical selector
num_cols_selector = selector(dtype_exclude=object)
cat_cols_selector = selector(dtype_include=object)

num_cols = num_cols_selector(X)
cat_cols = cat_cols_selector(X)

# Perform Encoding and normalization
cat_preprocessor = OneHotEncoder(handle_unknown="ignore")
num_preprocessor = StandardScaler()

preprocessor = make_column_transformer([
    (cat_preprocessor, cat_cols),
    (num_preprocessor, num_cols)])

# Fit & transform the encoding and normalization
X = np.array(preprocessor.fit_transform(X))

然后,出现此错误消息 错误

我读了这篇文章,发布它的原始人能够解决它。但是,我的整个 X 数据是训练数据,在拆分为训练和测试之前用于编码和规范化,因此无法解决我的问题。 XGBoost 错误 - 当提供分类类型时,DMatrix 参数 `enable_categorical` 必须设置为 `True`

这是数据集的片段 https://drive.google.com/file/d/1A787cBCqOKbyAB59aSsGHzb89jsL_9yt/view?usp=sharing

请告知我可能错在哪里?

标签: pythonpython-3.xdata-cleaning

解决方案


推荐阅读