h2o - 在 AutoML H2O 上使用 balance_classes 生成错误“java.lang.IllegalArgumentException: Error during sampling - too little points?”
问题描述
在 AutoML H2O 上使用 balance_classes 生成错误“java.lang.IllegalArgumentException: Error during sampling - too little points?”
我正在尝试使用启用了 nfolds=5 和 balance_classes 的 AutoML H2O 模型来训练多类问题:
数据框上有三个不同的标签:
target Count
------------- -------
não conhecido 3789
não provido 11039
provido 3225
[3 rows x 2 columns]
所有模型都失败并显示消息“java.lang.IllegalArgumentException:采样期间出错 - 点太少?”。
我不认为有太少的点。有人可以解释这个问题吗?
使用的参数:
include_algos = ["DRF", "GBM", "StackedEnsemble"],
seed=1234,
nfolds = nfolds,
balance_classes = True,
max_runtime_secs = 86400,
max_models=8,
max_runtime_secs_per_model = 1200,
keep_cross_validation_predictions = True,
verbosity = "debug",
日志:
Executando o treinamento do modelo do problema < tipo_decisao >...
AutoML progress: |
02:51:01.681: Project: automl_py_488_sid_932d
02:51:01.681: AutoML job created: 2019.12.10 02:51:01.680
02:51:01.681: Disabling Algo: DeepLearning as requested by the user.
02:51:01.682: Disabling Algo: XGBoost as requested by the user.
02:51:01.682: Disabling Algo: GLM as requested by the user.
02:51:01.682: Build control seed: 1234
02:51:01.706: training frame: Frame key: automl_training_py_488_sid_932d cols: 1225 rows: 18053 chunks: 200 size: 192349542 checksum: 7379304490974335888
02:51:01.706: validation frame: NULL
02:51:01.706: leaderboard frame: NULL
02:51:01.706: blending frame: NULL
02:51:01.706: response column: target
02:51:01.706: fold column: null
02:51:01.706: weights column: null
02:51:01.737: Setting stopping tolerance adaptively based on the training frame: 0.007442610801832542
02:51:01.799: AutoML build started: 2019.12.10 02:51:01.799
█
02:51:04.812: Default Random Forest build failed: java.lang.IllegalArgumentException: Error during sampling - too few points?
██
02:51:07.831: GBM 1 failed: java.lang.IllegalArgumentException: Error during sampling - too few points?
██
02:51:10.844: GBM 2 failed: java.lang.IllegalArgumentException: Error during sampling - too few points?
██████
02:51:14.878: GBM 3 failed: java.lang.IllegalArgumentException: Error during sampling - too few points?
███
02:51:18.897: GBM 4 failed: java.lang.IllegalArgumentException: Error during sampling - too few points?
███
02:51:19.915: GBM 5 failed: java.lang.IllegalArgumentException: Error during sampling - too few points?
███
02:51:22.954: Extremely Randomized Trees (XRT) Random Forest build failed: java.lang.IllegalArgumentException: Error during sampling - too few points?
02:51:22.954: AutoML: starting GBM hyperparameter search
████████████████████████████████████| 100%
02:51:41.57: No models were built, due to timeouts or the exclude_algos option. StackedEnsemble builds skipped.
02:51:41.57: AutoML build stopped: 2019.12.10 02:51:41.57
02:51:41.57: AutoML build done: built 0 models
02:51:41.57: AutoML duration: 39.258 sec
解决方案
I checked the source code and it doesn't look like it would be due to too few observations.
Can you please run just a single GBM model with balance classes enabled and provide the H2O log? http://docs.h2o.ai/h2o/latest-stable/h2o-docs/logs.html#logging-in-python
I am not quite sure if the current log will give us enough info to figure it out but I will make a change that will add more info in the next release.
推荐阅读
- android - 我如何知道我的 Android 应用程序是否在前台,包括在显示插页式广告时?
- angular9 - 如何在角度 9 中为多个 if else 语句编写测试用例
- r - 将逗号分隔的数据集作为单独的列导入时出现问题
- mysql - cPanel 部署:我的 app.js 暴露给大家看
- python - 我无法运行 python 的任何命令或文件 - 终端中没有输出
- gdal - Geoserver 2.19 ImagePyramid 处理错误
- android - 从图库中选择图像时活动崩溃(仅限 Google 登录)
- neo4j - 在 Neo4J 中切换关系
- php - OpenTBS复制幻灯片并创建一个新的pptx文件
- python - 从数值变量特征工程二进制变量