首页 > 解决方案 > raise ValueError("Input contains NaN") ValueError: Input contains NaN 在尝试构建机器学习模型时

问题描述

我正在尝试建立一个预测模型,但目前不断收到错误:raise ValueError("Input contains NaN") ValueError: Input contains NaN. 我尝试使用np.any(np.isnan(dataframe))and np.any(np.isnan(dataframe)),但我不断收到新的错误。例如,TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

这是到目前为止的代码:

import pandas as pd
from sklearn.preprocessing import LabelEncoder
import numpy as np

dataframe = pd.read_csv('file.csv', delimiter=',')

le = LabelEncoder()
dfle = dataframe

dfle2 = dfle.apply(lambda col: le.fit_transform(col.astype(str)), axis=0, result_type='expand')

newdf = dfle2[['column1', 'column2', 'column3', 'column4', 'column5', 'column6', 'column7']]

X = dataframe[['column1', 'column2', 'column4', 'column5', 'column6', 'column7']].values

y = dfle.column3

from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
ohe = OneHotEncoder()

ColumnTransformer([('encoder', OneHotEncoder(), [0])], remainder='passthrough')
# np.all(np.isfinite(dfle))
# np.any(np.isnan(dfle))
X = ohe.fit_transform(X).toarray()

标签: pythonpandasscikit-learnone-hot-encodinglabel-encoding

解决方案


您可以先做多种事情来处理此错误,您可以将 Nan 值填充为 0dataframe = pd.read_csv('file.csv', delimiter=',').fillna(0)

或者您可以使用sklearn插补技术来填充 Nan 值。

https://scikit-learn.org/stable/modules/classes.html#module-sklearn.impute

可以使用多种插补技术,但您应该使用KNNImputer.


推荐阅读