python-3.x - 如何在 Python 中将 int 数据类型的分类数据更改为数值数据
问题描述
我有一个数据集,其中包含以下列:偏离党派、民主党、与政党在社会问题上的分歧、该受访者的 gss 年份。在 Object 数据类型中可以使用 Partisanship、Democratic、Disagreement with Party on Social Issues 等偏差,因此我必须将它们转换为字符串以将它们编码为数字数据。
'此受访者的 gss 年份'包含从 1970 年到 2000 年的年份,并以 int 数据类型提供,因此我不会将其转换为字符串来执行编码。以下是我正在使用的代码:
#importing libraires
import pandas as pd
from sklearn.preprocessing import LabelEncoder
#importing data sets
df = pd.read_excel('sec3_data.xlsx')
df.fillna(0, inplace=True)
#converting categorical data to numeric data.
df['Deviation from Partisanship'] = df['Deviation from Partisanship'].astype('str')
le = preprocessing.LabelEncoder()
df['Deviation from Partisanship'] = le.fit_transform(df['Deviation from Partisanship'])
df['Democrat'] = df['Democrat'].astype('str')
le = preprocessing.LabelEncoder()
df['Democrat'] = le.fit_transform(df['Democrat'])
df['Disagreement with Party on Social Issues'] = df['Disagreement with Party on Social Issues'].astype('str')
le = preprocessing.LabelEncoder()
df['Disagreement with Party on Social Issues'] = le.fit_transform(df['Disagreement with Party on Social Issues'])
le = preprocessing.LabelEncoder()
df['gss year for this respondent'] = le.fit_transform(df['gss year for this respondent'])
pd.set_option('display.max_rows', 164)
df
当我运行此代码时,它给了我以下错误:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2896 try:
-> 2897 return self._engine.get_loc(key)
2898 except KeyError:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'gss year for this respondent'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-44-fbfcad1e7a05> in <module>
13
14 le = preprocessing.LabelEncoder()
---> 15 df['gss year for this respondent'] = le.fit_transform(df['gss year for this respondent'])
16
17 pd.set_option('display.max_rows', 164)
~\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2978 if self.columns.nlevels > 1:
2979 return self._getitem_multilevel(key)
-> 2980 indexer = self.columns.get_loc(key)
2981 if is_integer(indexer):
2982 indexer = [indexer]
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2897 return self._engine.get_loc(key)
2898 except KeyError:
-> 2899 return self._engine.get_loc(self._maybe_cast_indexer(key))
2900 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2901 if indexer.ndim > 1 or indexer.size > 1:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'gss year for this respondent'
知道为什么我会收到此错误吗?
解决方案
重新发布我自己的评论作为答案,以快速造福他人。
这种类型的错误与引用不存在的 DataFrame 列是一致的。您可以通过 快速检查您引用的列是否存在于 DataFrame 中'<COLUMN_NAME>' in df.columns
。如果列存在,它应该返回 True。
推荐阅读
- apiary - 如何在 apiary 新的交互式文档渲染器中启用生产服务器请求?
- c++ - 如何提高 c++ 目标上的 antlr4 运行时性能?
- python - 如何在 Python 中使用时间数据绘制直方图
- angular - 使用 Karma 进行测试 - 无法绑定到“routerLink”,因为它不是
- php - thinkphp5.0.x调试中的sql_injection漏洞
- javascript - 在下拉点击时,侧边栏需要变高
- java - 下面的 Java 代码没有显示代码中的任何错误,但它会抛出一个异常,说明非法选项。那么,缺少什么?
- angular - angular ngx-datatable ngFor 行不显示数据
- npgsql - 来自 Npgsql 二进制副本的外部“数字”值错误的无效登录
- nativescript - NativeScript Vue - 导航到其他组件同时保持具有配置文件信息的相同顶部的最佳方式?