首页 > 解决方案 > onehotencoding 时功能名称丢失

问题描述

使用 onehotencoding 构建管道,当拟合和转换为训练/测试集并转换为数据帧时,它会导致特征没有名称。有什么方法可以获取每个编码特征的名称吗?

# Numerical column transformer
num_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='mean')),
    ('scaler', StandardScaler())
])

# Categorical column transformer
cat_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))
])

# Preprocessing pipeline
preprocessor = ColumnTransformer(
    transformers=[
        ('num', num_transformer, numerical_cols),
        ('cat', cat_transformer, categorical_cols)
    ])


# Fitting the data and transforming the training & test set
X_train_preprocessed = preprocessor.fit_transform(X_train)
test_preprocessed = preprocessor.fit_transform(test)

标签: pythonpipelineone-hot-encoding

解决方案


named_transformers_您可以使用的属性访问转换器ColumnTransformer。您有 2 个名为'num'and的转换器'cat',因此preprocessor.named_transformers_['cat']您可以访问您的cat_transformer. 然后使用您的named_steps属性Pipeline可以访问您的OneHotEncoder命名'onehot'及其categories_属性:

X = [['Male', 1], ['Female', 3], ['Female', 2]]

preprocessor.fit_transform(X)
Out[6]: 
array([[-1.22474487,  0.        ,  1.        ],
       [ 1.22474487,  1.        ,  0.        ],
       [ 0.        ,  1.        ,  0.        ]])

preprocessor.named_transformers_['cat'].named_steps['onehot'].categories_
Out[7]: [array(['Female', 'Male'], dtype=object)]

推荐阅读