python - 使用 sklearn 对我的数据集的一列进行二值化后,结果不正确。代码哪里错了?
问题描述
我预处理一个数据集。我对其中一列进行了二值化。二值化后,我认为这些值不正确。数据有 303 个观察值(行)和 14 个特征(列)。我要二值化的列是最后一列。
这是我的代码的一部分-
import pandas as pd
import numpy as np
#importing the dataset
header_names = ['age','sex','cp','trestbps','chol','fbs','restecg','thalach','exang','oldpeak','slope','ca','thal','num']
dataset = pd.read_csv('E:/HCU proj doc/EHR dataset/cleveland_data.csv', names= header_names)
array = dataset.values
# binarize num
from sklearn.preprocessing import Binarizer
x = array[:,13:]
binarize = Binarizer(threshold=0.0).fit(x)
transform_binarize = binarize.transform(x)
array[:,13:]=transform_binarize
print(transform_binarize)
这是原始数据列的样子-
0,2,1,0,0.........1,0,3,1,1,2
这是上面代码的输出-
[[0.]
[1.]
[1.]
[0.]
[0.]
[0.]
[1.]
[0.]
[1.]
[1.]
[0.]
[0.]
[1.]
[0.]
[0.]
[0.]
[1.]
[0.]
[0.]
[0.]
[0.]
[0.]
[1.]
[1.]
[1.]
[0.]
[0.]
[0.]
[0.]
[1.]
[0.]
[1.]
[1.]
[0.]
[0.]
[0.]
[1.]
[1.]
[1.]
[0.]
[1.]
[0.]
[0.]
[0.]
[1.]
[1.]
[0.]
[1.]
[0.]
[0.]
[0.]
[0.]
[1.]
[0.]
[1.]
[1.]
[1.]
[1.]
[0.]
[0.]
[1.]
[0.]
[1.]
[0.]
[1.]
[1.]
[1.]
[0.]
[1.]
[1.]
[0.]
[1.]
[1.]
[1.]
[1.]
[0.]
[1.]
[0.]
[0.]
[1.]
[0.]
[0.]
[0.]
[1.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[1.]
[0.]
[0.]
[0.]
[1.]
[1.]
[1.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[1.]
[0.]
[1.]
[1.]
[1.]
[1.]
[1.]
[1.]
[0.]
[1.]
[1.]
[0.]
[0.]
[0.]
[1.]
[1.]
[1.]
[1.]
[0.]
[1.]
[1.]
[0.]
[1.]
[1.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[1.]
[1.]
[1.]
[0.]
[0.]
[1.]
[0.]
[1.]
[0.]
[1.]
[1.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[1.]
[1.]
[1.]
[1.]
[1.]
[1.]
[0.]
[0.]
[1.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[1.]
[0.]
[1.]
[0.]
[1.]
[0.]
[1.]
[1.]
[0.]
[1.]
[0.]
[0.]
[1.]
[1.]
[0.]
[0.]
[1.]
[0.]
[0.]
[1.]
[1.]
[1.]
[0.]
[1.]
[1.]
[1.]
[0.]
[1.]
[0.]
[0.]
[0.]
[1.]
[0.]
[0.]
[0.]
[0.]
[0.]
[1.]
[1.]
[1.]
[0.]
[1.]
[0.]
[1.]
[0.]
[1.]
[1.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[1.]
[1.]
[0.]
[0.]
[0.]
[1.]
[1.]
[0.]
[1.]
[1.]
[0.]
[0.]
[1.]
[1.]
[1.]
[0.]
[0.]
[0.]
[0.]
[0.]
[1.]
[0.]
[1.]
[1.]
[1.]
[1.]
[0.]
[0.]
[1.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[1.]
[0.]
[1.]
[0.]
[0.]
[1.]
[1.]
[1.]
[1.]
[1.]
[0.]
[1.]
[0.]
[1.]
[0.]
[1.]
[0.]
[0.]
[0.]
[1.]
[0.]
[1.]
[0.]
[1.]
[0.]
[1.]
[1.]
[1.]
[0.]
[0.]
[0.]
[1.]
[0.]
[1.]
[1.]
[1.]
[0.]
[1.]
[1.]
[1.]
[1.]
[1.]
[1.]
[0.]]
我认为最后一个是不正确的。我不明白为什么会这样。
解决方案
如果我假设这是从这个 UCI 存储库获取的心脏病数据集并且 csv 文件是这个是正确的,那么在这种情况下,这些是二值化器的正确值。您使用的原始数据列0
在最后一行有一个,我想您错过了,试试这个代码
for idx in range(0,len(x)):
print idx,x[idx],transform_binarize[idx]
输出
278 [1L] [1.]
279 [0L] [0.]
280 [2L] [1.]
281 [0L] [0.]
282 [3L] [1.]
283 [0L] [0.]
284 [2L] [1.]
285 [4L] [1.]
286 [2L] [1.]
287 [0L] [0.]
288 [0L] [0.]
289 [0L] [0.]
290 [1L] [1.]
291 [0L] [0.]
292 [2L] [1.]
293 [2L] [1.]
294 [1L] [1.]
295 [0L] [0.]
296 [3L] [1.]
297 [1L] [1.]
298 [1L] [1.]
299 [2L] [1.]
300 [3L] [1.]
301 [1L] [1.]
302 [0L] [0.] #<--- I think you missed this row while reading your dataset
如果您尝试此代码,那么您会发现二值化器完全按照应有的方式工作。
推荐阅读
- c++ - Drogon 为线程分配资源
- ios - swift scrollToRow 或 setContentOffset 不起作用
- python-3.x - 如何在 matplotlib 中更改 quiverplot [3d] 中的箭头属性?
- java - 我正在打开 netbeans,它正在加载,但屏幕上没有显示。当我再次打开它时,它只显示一个空白窗口,任何人都可以帮助我
- vba - 如果列没有特定值,则 VBA Excel
- python - 部署经过 Kaldi 训练的 ASR 模型以实时转录文本的方法
- google-anthos - 使用 Anthos Service Mesh 的 Google Traffic Director
- c++ - 如何使用winsock 2 异步接受套接字并从中读取数据?
- javascript - 等待条件满足方法
- java - Microsoft Graph java SDK 访问令牌为空