首页 > 解决方案 > 给定的列不是数据框 Pandas 的列

问题描述

我有以下拆分功能:

from typing import Tuple
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split

def split_dataframe(
    
    df: pd.DataFrame, 
    target_feature: str, 
    split_ratio: int = 0.2

) -> Tuple[pd.DataFrame, pd.DataFrame, np.ndarray, np.ndarray]:
     
    df_ = df.copy()
    
    X = df_.drop(target_feature, axis=1)
    y = df_[target_feature]
    
    encoder = LabelEncoder()
    y = encoder.fit_transform(y)
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = split_ratio)
    
    return  X_train, X_test, y_train, y_test

我使用以下方法拆分数据框:

X_train, X_test, y_train, y_test = split_dataframe(df, 'Банк')

我使用管道来转换 X_train 和 y_train

from sklearn.pipeline import Pipeline, FeatureUnion
from mlxtend.feature_selection import ColumnSelector
import category_encoders as ce

cat_pipe = Pipeline(
    [
        ('selector', ColumnSelector(categorical_features)),
        ('encoder', ce.one_hot.OneHotEncoder())
    ]
)

num_pipe = Pipeline(
    [
        ('selector', ColumnSelector(numeric_features)),
        ('scaler', StandardScaler())
    ]
)

preprocessor = FeatureUnion(
    transformer_list=[
        
        ('cat', cat_pipe),
        ('num', num_pipe)
    ]
)

new_df = pipe.fit_transform(X_train, y_train)

之后我得到了ValueError: A given column is not a column of the dataframe,特别是KeyError: 'Банк'。我检查了在传递数据帧之前是否存在列以在训练和测试中拆分。如果我删除一切正常X = df_.drop(target_feature, axis=1)X = df_但目标功能仍在 X 中。

标签: pythonpandas

解决方案


我发现了一个错误pipe.fit_transform(X_train, y_train),我将其更改为preprocessor.fit_transform(X_train, y_train)并且它有效


推荐阅读