首页 > 解决方案 > XGBoost:检查失败:有效:输入数据包含“inf”或“nan”

问题描述

我正在尝试在 Windows 10 上运行XGBoost。我的代码的相关部分如下所示:

model = XGBClassifier()
print(x_train.shape)
print(y_train.shape)

print(np.isnan(x_train).any())
print(np.isnan(y_train).any())
print(np.isinf(x_train).any())
print(np.isinf(y_train).any())
print(np.isfinite(x_train).all())
print(np.isfinite(y_train).all())

model.fit(x_train, y_train)

并产生以下结果:

(4116, 37)  
(4116,)
False  
False
False
False
True
True

The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1]. Traceback (most recent call last): [...]
    model.fit(x_train, y_train)   File "D:\Programs\Anaconda\lib\site-packages\xgboost\core.py", line 436, in inner_f
    return f(**kwargs)   File "D:\Programs\Anaconda\lib\site-packages\xgboost\sklearn.py", line 1173, in fit
    label_transform=label_transform,   File "D:\Programs\Anaconda\lib\site-packages\xgboost\sklearn.py", line 244, in _wrap_evaluation_matrices
    missing=missing,   File "D:\Programs\Anaconda\lib\site-packages\xgboost\sklearn.py", line 1172, in <lambda>
    create_dmatrix=lambda **kwargs: DMatrix(nthread=self.n_jobs, **kwargs),   File "D:\Programs\Anaconda\lib\site-packages\xgboost\core.py", line 436, in inner_f
    return f(**kwargs)   File "D:\Programs\Anaconda\lib\site-packages\xgboost\core.py", line 547, in
__init__
    enable_categorical=enable_categorical,   File "D:\Programs\Anaconda\lib\site-packages\xgboost\data.py", line 565, in dispatch_data_backend
    feature_types)   File "D:\Programs\Anaconda\lib\site-packages\xgboost\data.py", line 169, in
_from_numpy_array
    ctypes.c_int(nthread)))   File "D:\Programs\Anaconda\lib\site-packages\xgboost\core.py", line 210, in
_check_call
    raise XGBoostError(py_str(_LIB.XGBGetLastError())) xgboost.core.XGBoostError: [14:21:29] C:/Users/Administrator/workspace/xgboost-win64_release_1.4.0/src/data/data.cc:945: Check failed: valid: Input data contains `inf` or `nan`

我的数据显然不包含任何“inf”或“nan”值。任何关于如何从这里开始的想法都非常感谢。

标签: numpyxgboost

解决方案


我刚刚遇到了同样的错误,它似乎是由于存在非常大的浮点数(1e300)引起的。我使用对数变换修复了它。


推荐阅读