首页 > 解决方案 > `ColumnTransformer.fit_transform()` 的结果只包含后来的transfromer的结果

问题描述

ColumnTransformer 中有 2 个变压器。但结果ColumnTransformer.fit_transform()只包含后面的transfromer的结果:

pos_time
array([[1.24100000e+03, 6.27000000e+02, 1.56279701e+09],
       [1.27100000e+03, 6.90000000e+02, 1.56279701e+09],
       [1.30200000e+03, 7.49000000e+02, 1.56279701e+09],
       ...,
       [1.81600000e+03, 8.60000000e+01, 1.56279703e+09],
       [1.81600000e+03, 8.60000000e+01, 1.56279703e+09],
       [1.81600000e+03, 8.60000000e+01, 1.56279703e+09]])

我想这是因为这条线X=X.drop('Time',axis=1)TimeTransformer. 如果我注释掉该行,结果ColumnTransformer.fit_transform()将是:

pos_time
array([[1241.0, 627.0, Timestamp('2019-07-10 22:16:46.036385'),
        1562797006.036385],
       [1271.0, 690.0, Timestamp('2019-07-10 22:16:46.052012'),
        1562797006.052012],
       [1302.0, 749.0, Timestamp('2019-07-10 22:16:46.067638'),
        1562797006.067638],
       ...,
       [1816.0, 86.0, Timestamp('2019-07-10 22:17:08.327709'),
        1562797028.327709],
       [1816.0, 86.0, Timestamp('2019-07-10 22:17:08.496155'),
        1562797028.496155],
       [1816.0, 86.0, Timestamp('2019-07-10 22:17:08.585392'),
        1562797028.585392]], dtype=object)

但我不想获得第三列。我想知道为什么会发生这种情况以及如何解决它。谢谢你!

这是我的代码。这PositionTransformer将字符串转换为 x, y 坐标:

def position(string):
    x_,y_=string.lstrip('(').rstrip(')').split(',')
    x_,y_=float(x_),float(y_) 
    return x_,y_
class PositionTransformer(BaseEstimator,TransformerMixin):
    def __init__(self):
        pass
    def fit(self,X,y=None):
        return self
    def transform(self,X,y=None):
        print(type(X))
        print(X.head())
        print(X.shape)
        X=X['Position']
        xy=X.apply(position)
        x=xy.apply(lambda x:x[0])
        y=xy.apply(lambda x:x[1])
        xy=np.c_[x.values,y.values]
        return xy
        

TimeTransformer将字符串转换为时间戳:

class TimeTransformer(BaseEstimator,TransformerMixin):
    def __init__(self):
        pass
    def fit(self,X,y=None):
        return self
    def transform(self,X,y=None):
        X['Time']=pd.to_datetime(X['Time'])
        X['UnixTime']=X.apply(lambda row:row['Time'].timestamp(), axis=1)
        X=X.drop('Time',axis=1)
        return X

这是ColumnTransformer

time=['Time']
pos=['Position']
time_pos_transformer=ColumnTransformer([
    ('pos_transformer',PositionTransformer(),pos),
    ('time_transformer',TimeTransformer(),time),
])
pos_time=time_pos_transformer.fit_transform(df)

标签: pythonscikit-learnsklearn-pandas

解决方案


对不起,结果ColumnTransformer.fit_transform()是对的。我犯错误的原因是,当我打印 时pos_time,控制台只输出它的一部分。


推荐阅读