python - XGBoost 随机给出“0.5”的静态预测
问题描述
我正在使用带有 XGBRegressor 的 scikit-learn 管道。管道运行良好,没有任何错误。当我使用这条管道进行预测时,我会多次预测相同的数据,有时预测值随机出现为 0.5,而正常预测范围为 (1000-10,000)
例如:(1258.2,1258.2,1258.2,1258.2,1258.2,1258.2,0.5,1258.2,1258.2,1258.2,1258.2)
- 输入数据完全相同
环境一样
numeric_transformer = Pipeline(steps=[ ('imputer', SimpleImputer(strategy='mean')), ('scaler', StandardScaler())]) categorical_transformer = Pipeline(steps=[ ('imputer', SimpleImputer(strategy='constant', fill_value='missing')), ('onehot', OneHotEncoder(handle_unknown='ignore')) ]) numeric_features = X.select_dtypes( include=['int64', 'float64']).columns categorical_features = X.select_dtypes( include=['object']).columns preprocessor = ColumnTransformer( transformers=[ ('num', numeric_transformer, numeric_features), ('cat', categorical_transformer, categorical_features)]) # Number of trees n_estimators = [int(x) for x in np.linspace(start=50, stop=1000, num=10)] # Maximum number of levels in tree max_depth = [int(x) for x in np.linspace(1, 32, 32, endpoint=True)] # Booster booster = ['gbtree', 'gblinear', 'dart'] # selecting gamma gamma = [i / 10.0 for i in range(0, 5)] # Learning rate learning_rate = np.linspace(0.01, 0.2, 15) # Evaluation metric # eval_metric = ['rmse','mae'] # regularization reg_alpha = [1e-5, 1e-2, 0.1, 1, 100] reg_lambda = [1e-5, 1e-2, 0.1, 1, 100] # Min chile weight min_child_weight = list(range(1, 6, 2)) # Samples subsample = [i / 10.0 for i in range(6, 10)] colsample_bytree = [i / 10.0 for i in range(6, 10)] # Create the random grid random_grid = {'n_estimators': n_estimators, 'max_depth': max_depth, 'booster': booster, 'gamma': gamma, 'learning_rate': learning_rate, # 'eval_metric' : eval_metric, 'reg_alpha': reg_alpha, 'reg_lambda': reg_lambda, 'min_child_weight': min_child_weight, 'subsample': subsample, 'colsample_bytree': colsample_bytree } # Use the random grid to search for best hyperparameters # First create the base model to tune rf = xgboost.XGBRegressor(objective='reg:squarederror', n_jobs=4) # Random search of parameters, using 3 fold cross validation, # search across 100 different combinations, and use all available cores rf_random = RandomizedSearchCV(estimator=rf, param_distributions=random_grid, n_iter=100, cv=3, verbose=0, random_state=42, n_jobs=4) pipe = Pipeline(steps=[('preprocessor', preprocessor), ('regressor', rf_random)]) pipe.fit(X, y)
可能是什么问题?
解决方案
如果您得到一些异常低的预测,则可能表明因变量存在异常值。我建议您阅读它,以及解决此问题的不同策略或建议。
通常在不去除异常值的情况下考虑模型的所有数据样本并不是一个好主意。这将导致更糟糕和不具代表性的指标。
推荐阅读
- c# - 什么都没有被破坏?
- mql5 - MQL5 错误 [4805] - 将指标应用于图表时出错
- amazon-web-services - 如何将 API 密钥作为令牌裸机附加到具有 terraform aws api 网关资源的 POST 方法
- c# - 如何在编辑器中打开 c# 语言功能图标?
- c++ - 旋转模型矩阵时的形状变形
- mysql - MySQL JOIN 和 CONCAT 数据问题
- android - 在不知道 requestCodes 的情况下取消 Android 警报
- swift - 如何访问领域结果并遍历结果
- android - Firebase Timestamp 延迟 android 在第一次将第一个文档添加到集合时导致应用程序崩溃。(orderby 时间戳)
- c# - 如何在.net标准中获取客户端的IP地址?