首页 > 解决方案 > 只有 pandas DataFrames 支持使用字符串指定列

问题描述

我想对几列进行 One-hot-encoding 并使用了几种解决方案,包括简单的 one-hot-encoding、ColumnTransformer、make_column_transformer、Pipeline 和 get_dummies,但任何时候我都会遇到不同的错误。

x = dataset.iloc[:, :11].values
y = dataset.iloc[:, 11].values


""" data encoding """

from sklearn.preprocessing import LabelEncoder, OneHotEncoder, OrdinalEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline


# oe = OrdinalEncoder()
# x = oe.fit_transform(x)

non_cat = ["Make", "Model", "Vehicle", "Transmission", "Fuel"]

onehot_cat = ColumnTransformer([
    ("categorical", OrdinalEncoder(), non_cat),
    ("onehot_categorical", OneHotEncoder(), non_cat)],
    remainder= "passthrough")
x = onehot_cat.fit_transform(x)

错误:

[['ACURA' 'ILX' 'COMPACT' ... 6.7 8.5 33]
['ACURA' 'ILX' 'COMPACT' ... 7.7 9.6 29]
['ACURA' 'ILX HYBRID' 'COMPACT' ... 5.8 5.9 48]
...
['VOLVO' 'XC60 T6 AWD' 'SUV - SMALL' ... 8.6 10.3 27]
['VOLVO' 'XC90 T5 AWD' 'SUV - STANDARD' ... 8.3 9.9 29]
['VOLVO' 'XC90 T6 AWD' 'SUV - STANDARD' ... 8.7 10.7 26]]
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~\Anaconda3\lib\site-packages\sklearn\utils\__init__.py in _get_column_indices(X, key)
424         try:
--> 425             all_columns = X.columns
426         except AttributeError:

AttributeError: 'numpy.ndarray' object has no attribute 'columns'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-4-4008371c305f> in <module>
 24     ("onehot_categorical", OneHotEncoder(), non_cat)],
 25     remainder= "passthrough")
 ---> 26 x = onehot_cat.fit_transform(x)
 27 
 28 print('OneHotEncode = ', x.shape)

~\Anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py in fit_transform(self, X, y)
527         self._validate_transformers()
528         self._validate_column_callables(X)
--> 529         self._validate_remainder(X)
530 
531         result = self._fit_transform(X, y, _fit_transform_one)

~\Anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py in _validate_remainder(self, X)
325         cols = []
326         for columns in self._columns:
--> 327             cols.extend(_get_column_indices(X, columns))
328 
329         remaining_idx = sorted(set(range(self._n_features)) - set(cols))

~\Anaconda3\lib\site-packages\sklearn\utils\__init__.py in _get_column_indices(X, key)
425             all_columns = X.columns
426         except AttributeError:
--> 427             raise ValueError("Specifying the columns using strings is only "
428                              "supported for pandas DataFrames")
429         if isinstance(key, str):

ValueError:仅熊猫数据帧支持使用字符串指定列

标签: python-3.xpandasscikit-learn

解决方案


我在尝试使用模型进行预测时遇到了类似的错误。它期待一个数据框,但我发送的是一个 numpy 对象。所以我把它从:

prediction = monitor_model.predict(s_df.to_numpy())

至:

prediction = monitor_model.predict(s_df)

推荐阅读