首页 > 解决方案 > 在 pandas 列上应用 MinMaxScaler()

问题描述

我正在尝试使用 sklearn MinMaxScaler 重新缩放 python 列,如下所示:

scaler = MinMaxScaler()
y = scaler.fit(df['total_amount'])

但出现以下错误:

Traceback (most recent call last):
  File "/Users/edamame/workspace/git/my-analysis/experiments/my_seq.py", line 54, in <module>
    y = scaler.fit(df['total_amount'])
  File "/Users/edamame/workspace/git/my-analysis/venv/lib/python3.4/site-packages/sklearn/preprocessing/data.py", line 308, in fit
    return self.partial_fit(X, y)
  File "/Users/edamame/workspace/git/my-analysis/venv/lib/python3.4/site-packages/sklearn/preprocessing/data.py", line 334, in partial_fit
    estimator=self, dtype=FLOAT_DTYPES)
  File "/Users/edamame/workspace/git/my-analysis/venv/lib/python3.4/site-packages/sklearn/utils/validation.py", line 441, in check_array
    "if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[3.180000e+00 2.937450e+03 6.023850e+03 2.216292e+04 1.074589e+04
   :
 0.000000e+00 0.000000e+00 9.000000e+01 1.260000e+03].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

知道出了什么问题吗?

标签: python-3.xpandasscikit-learn

解决方案


MinMaxScaler的输入需要类似于数组,带有shape [n_samples, n_features]. 因此,您可以将其作为数据框而不是系列应用于列(使用双方括号而不是单方括号):

y = scaler.fit(df[['total_amount']])

尽管从您的描述来看,这听起来像是您想要的,fit_transform而不仅仅是fit(但我可能是错的):

y = scaler.fit_transform(df[['total_amount']])

多一点解释:

如果您的数据框有 100 行,请考虑将列转换为数组时的形状差异:

>>> np.array(df[['total_amount']]).shape
(100, 1)

>>> np.array(df['total_amount']).shape
(100,)

第一个返回匹配的形状[n_samples, n_features](根据 MinMaxScaler 的要求),而第二个不匹配。


推荐阅读