首页 > 解决方案 > 如何使用 sklearn.preprocessing 的 StandardScaler 仅标准化一列

问题描述

如果我有清单说

l = [[1, 2], [1, 3], [4, 5], [5, 10]]

我如何只能使用 sklearn.preprocessing -> StandardScaler 规范化第 2、3、5、10 列

标签: pythonscikit-learnknnsklearn-pandas

解决方案


您不必依赖 的功能StandardScaler来执行此操作。相反,您可以提取如下的第二列l

import numpy as np

# Making an array from list l. dtype needs to be float 
# so that the second column of l_arr can be replaced with
# its scaled counterpart without it being truncated to
# integers.
l_arr = np.array(l, dtype=float)

# Extracting the second column from l_arr
l_arr_2nd_col = l_arr[:,1]

# Converting l_arr_2nd_col into a column vector
l_arr_2nd_col = np.atleast_2d(l_arr_2nd_col).T

完成后,您可以StandardScaler按如下方式使用:

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(l_arr_2nd_col)

l_arr_2nd_col_scaled = scaler.transform(l_arr_2nd_col)

# ravel is needed, because l_arr[:,1] has shape (4,), but
# l_arr_2nd_col_scaled has shape (4,1).
l_arr[:,1] = l_arr_2nd_col_scaled.ravel()

此时,您可以这样做:

# Replaces l entirely, including replacing the integers in its
# first column with their floating-point counterparts.
l = l_arr.tolist()

或这个:

# Replace only selected components of l
for l_elem, l1_scaled in zip(l, l_arr[:,1]):
   l_elem[1] = l1_scaled

推荐阅读