首页 > 解决方案 > 仅编码数据框中的非数字列

问题描述

我有一个包含数字和非数字列的数据框。我想只编码非数字列并保持数字列的值不变。当我尝试使用我的代码进行编码时,它会对所有列进行编码。

你能帮忙解决这个问题吗?

谢谢

这是我的python代码

from sklearn.base import BaseEstimator
from sklearn.base import TransformerMixin
from sklearn.preprocessing import LabelEncoder

class CustomImputer(BaseEstimator, TransformerMixin):
    def __init__(self, strategy='mode',filler='NA'):
       self.strategy = strategy
       self.fill = filler

    def fit(self, X, y=None):
       if self.strategy in ['mean','median']:
           if not all(X.dtypes == np.number):
               raise ValueError('dtypes mismatch np.number dtype is \
                                 required for '+ self.strategy)
       if self.strategy == 'mean':
           self.fill = X.mean()
       elif self.strategy == 'median':
           self.fill = X.median()
       elif self.strategy == 'mode':
           self.fill = X.mode().iloc[0]
       elif self.strategy == 'fill':
           if type(self.fill) is list and type(X) is pd.DataFrame:
               self.fill = dict([(cname, v) for cname,v in zip(X.columns, self.fill)])
       return self

    def transform(self, X, y=None):
       return X.fillna(self.fill)

data3=CustomImputer(strategy='mode').fit_transform(data2)

标签: pythondataframe

解决方案


您正在寻找X.select_dtypes(np.number).


推荐阅读