首页 > 解决方案 > 如何使 FunctionTransformer 在 DataFrameMapper 中工作

问题描述

我的 pandas DataFrame 中有一个如下所示的列:

df = pd.DataFrame([
    ['26.6 km'],
    ['19.67 km'],
    ['18.2 km'],
    ['20.77 km'],
    ['15.2 km'],
], columns=['Mileage'])

我有一个从列中删除“km”的函数:

def remove_words(column):
    return column.str.split(' ').str[0]

当我把它放在我的 DataFrameMapper 中时:

mapper = DataFrameMapper([
     ('Mileage', [FunctionTransformer(remove_words)]),
     ], df_out=True)

...它返回错误“'numpy.ndarray'对象没有属性'str'”

帮助!

标签: pythonpandasfunctionsklearn-pandas

解决方案


使用extractreplace

df['Mileage'] = df['Mileage'].str.extract('(\d*\.?\d*)', expand=False).astype(float)

或者,

df['Mileage'] = df['Mileage'].str.replace('[^\d.]', '').astype(float)

这是示例,

>>> import pandas as pd
>>> df = pd.DataFrame([
    ['26.6 km'],
    ['19.67 km'],
    ['18.2 km'],
    ['20.77 km'],
    ['15.2 km'],
], columns=['Mileage'])
>>> df['Mileage'].str.extract('(\d*\.?\d*)', expand=False).astype(float)
0    26.60
1    19.67
2    18.20
3    20.77
4    15.20
Name: Mileage, dtype: float64
>>> df['Mileage'].str.replace('[^\d.]', '').astype(float)
0    26.60
1    19.67
2    18.20
3    20.77
4    15.20
Name: Mileage, dtype: float64

或者,如果您想使用DataFrameMapperand FunctionTransformerfrom sklearn_pandas

from sklearn_pandas import DataFrameMapper, FunctionTransformer

def remove_words(val):
    return val.split(' ')[0]

mapper = DataFrameMapper([
     ('Mileage', [FunctionTransformer(remove_words)]),
     ], df_out=True)

print(mapper.fit_transform(df))

  Mileage
0    26.6
1   19.67
2    18.2
3   20.77
4    15.2

对于sklearn.preprocessing.FunctionTransformer,

from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import FunctionTransformer
import numpy as np

def remove_words(vals):
    return np.array([v[0].split(' ')[0] for v in vals])

mapper = DataFrameMapper([
     (['Mileage'], [FunctionTransformer(remove_words, validate=False)]),
     ], df_out=True)

print(mapper.fit_transform(df))

  Mileage
0    26.6
1   19.67
2    18.2
3   20.77
4    15.2

或使用numpy.vectorize

from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import FunctionTransformer
import numpy as np

func = np.vectorize(lambda x: x.split(' ')[0])

def remove_words(vals):
    return func(vals)

mapper = DataFrameMapper([
     (['Mileage'], [FunctionTransformer(remove_words, validate=False)]),
     ], df_out=True)

print(mapper.fit_transform(df))

  Mileage
0    26.6
1   19.67
2    18.2
3   20.77
4    15.2

推荐阅读