首页 > 解决方案 > 编码熊猫数据框

问题描述

此数据框已从 sql 数据框下载。这是 EDA 的最后阶段。

在编码数据帧的代码中找不到错误。

我试图单独编码每一列,也给出了同样的错误。

Previous line:
D1.info()

Result:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 35735 entries, 0 to 46605
Data columns (total 11 columns):
c_CI_Cat                        35735 non-null object
c_Closure_Code                  35735 non-null object
c_WBS                           35735 non-null object
q_No_of_Reassignments           35735 non-null int64
q_No_of_Related_Incidents       35735 non-null float64
q_No_of_Related_Interactions    35735 non-null float64
t_Close_Time                    35735 non-null datetime64[ns]
t_Open_Time                     35735 non-null datetime64[ns]
t_ReopenFlag                    35735 non-null float64
t_TicketWIPDurationDays         35735 non-null float64
y_Priority                      35735 non-null object
dtypes: datetime64[ns](2), float64(4), int64(1), object(4)
memory usage: 3.3+ MB


Error line:
enc = LabelEncoder()
CatVarList = ['c_CI_Cat', 'c_Closure_Code', 'c_WBS', 't_ReopenFlag', 'y_Priority']
for i in CatVarList:
    D1[[i]] = enc.fit_transform(D1[[i]])

Error details:


/anaconda3/lib/python3.7/site-packages/sklearn/preprocessing/label.py:235: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-53-4ba8389dabbc> in <module>
      2 CatVarList = ['c_CI_Cat', 'c_Closure_Code', 'c_WBS','t_ReopenFlag','y_Priority']
      3 for i in CatVarList:
----> 4     D1[[i]] = enc.fit_transform(D1[[i]])
      5 
      6 D1.head()

/anaconda3/lib/python3.7/site-packages/sklearn/preprocessing/label.py in fit_transform(self, y)
    234         """
    235         y = column_or_1d(y, warn=True)
--> 236         self.classes_, y = _encode(y, encode=True)
    237         return y
    238 

/anaconda3/lib/python3.7/site-packages/sklearn/preprocessing/label.py in _encode(values, uniques, encode)
    106     """
    107     if values.dtype == object:
--> 108         return _encode_python(values, uniques, encode)
    109     else:
    110         return _encode_numpy(values, uniques, encode)

/anaconda3/lib/python3.7/site-packages/sklearn/preprocessing/label.py in _encode_python(values, uniques, encode)
     61     # only used in _encode below, see docstring there for details
     62     if uniques is None:
---> 63         uniques = sorted(set(values))
     64         uniques = np.array(uniques, dtype=values.dtype)
     65     if encode:

TypeError: '<' not supported between instances of 'str' and 'int'


必须对这些列进行编码,以便使用算法进行进一步分析。

标签: pythonpandasdataframescikit-learn

解决方案


推荐阅读