首页 > 解决方案 > Scikit-Learn,自定义转换器:ColumnSelectTransformer

问题描述

给定数据来自

%%bash
mkdir data
wget http://dataincubator-wqu.s3.amazonaws.com/mldata/providers-train.csv -nc -P ./ml-data
wget http://dataincubator-wqu.s3.amazonaws.com/mldata/providers-metadata.csv -nc -P ./ml-data

我的任务是完成以下代码片段。

simple_cols = ['BEDCERT', 'RESTOT', 'INHOSP', 'CCRC_FACIL', 'SFF', 'CHOW_LAST_12MOS', 'SPRINKLER_STATUS', 'EXP_TOTAL', 'ADJ_TOTAL']

class ColumnSelectTransformer(BaseEstimator, TransformerMixin):
    def __init__(self, columns):
        self.columns = columns

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        if not isinstance(X, pd.DataFrame):
            X = pd.DataFrame(X)
        return X[self.columns]

simple_features = Pipeline([
    ('cst', ColumnSelectTransformer(simple_cols)),
])

assert data['RESTOT'].isnull().sum() > 0
assert not np.isnan(simple_features.fit_transform(data)).any()

完成这个断言检查

assert data['RESTOT'].isnull().sum() > 0
assert not np.isnan(simple_features.fit_transform(data)).any()

但是我无法通过断言测试。下面附上我的尝试:

simple_cols = ['BEDCERT', 'RESTOT', 'INHOSP', 'CCRC_FACIL', 'SFF', 'CHOW_LAST_12MOS', 'SPRINKLER_STATUS', 'EXP_TOTAL', 'ADJ_TOTAL']

class ColumnSelectTransformer(BaseEstimator, TransformerMixin):
    def __init__(self, columns):
        self.columns = columns

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        if not isinstance(X, pd.DataFrame):
            X = pd.DataFrame(X)
        X.dropna(inplace=True)
        return X[self.columns].values()

simple_features = Pipeline([
    ('cst', ColumnSelectTransformer(simple_cols)),
])

希望能得到任何帮助,谢谢!

标签: pythonpandasdataframemachine-learningscikit-learn

解决方案


问题出在

return X[self.columns].values()

正确的 return 语句应该是

return X[self.columns].values

推荐阅读