h2o - 是否可以在 H2O Driverless 中定义最终模型使用多少变量
问题描述
我目前正在探索 H2O DAI 的功能。了解 H2O 能够在特征选择/工程阶段选择要使用的变量以及应用哪些转换器。但是有没有办法在 H2O DAI 中配置来限制它可以使用的最大功能数量?例如,给出了 100 个特征,我只希望 H2O DAI 从中选择 20 个特征并对其进行特征工程。试图浏览用户手册,但到目前为止没有找到任何提示。
提前谢谢了。
解决方案
有几个选项可以控制使用的功能数量
# Maximum number of columns selected out of original set of original columns, using feature selection
# The selection is based upon how well target encoding (or frequency encoding if not available) on categoricals and numerics treated as categoricals
# This is useful to reduce the final model complexity. First the best
# [max_orig_cols_selected] are found through feature selection methods and then
# these features are used in feature evolution (to derive other features) and in modelling.
#max_orig_cols_selected = 10000
# Maximum number of numeric columns selected, above which will do feature selection
# same as above (max_orig_cols_selected) but for numeric columns.
#max_orig_numeric_cols_selected = 10000
# Maximum number of non-numeric columns selected, above which will do feature selection on all features and avoid treating numerical as categorical
# same as above (max_orig_numeric_cols_selected) but for categorical columns.
#max_orig_nonnumeric_cols_selected = 300
# Like max_orig_cols_selected, but columns above which add special individual with original columns reduced.
#
#fs_orig_cols_selected = 500
# Maximum features per model (and each model within the final model if ensemble) kept.
# Keeps top variable importance features, prunes rest away, after each scoring.
# Final ensemble will exclude any pruned-away features and only train on kept features,
# but may contain a few new features due to fitting on different data view (e.g. new clusters)
# Final scoring pipeline will exclude any pruned-away features,
# but may contain a few new features due to fitting on different data view (e.g. new clusters)
# -1 means no restrictions except internally-determined memory and interpretability restrictions.
# Notes:
# * If interpretability > remove_scored_0gain_genes_in_postprocessing_above_interpretability, then
# every GA iteration post-processes features down to this value just after scoring them. Otherwise,
# only mutations of scored individuals will be pruned (until the final model where limits are strictly applied).
# * If ngenes_max is not also limited, then some individuals will have more genes and features until
# pruned by mutation or by preparation for final model.
# * E.g. to generally limit every iteration to exactly 1 features, one must set nfeatures_max=ngenes_max=1
# and remove_scored_0gain_genes_in_postprocessing_above_interpretability=0, but the genetic algorithm
# will have a harder time finding good features.
#
#nfeatures_max = -1
查看config.toml 文件或查看专家设置。
请注意,您无法控制是否拥有变压器的特定功能。
推荐阅读
- c - 当我只为它分配 1 个字节时,为什么 C 会正确输出一个整数?
- javascript - 如何使用react-js以动态形式将复选框值与文本输入绑定?
- php - 在单个 mysqli 查询中获得两个计数并执行计算
- jquery - 使用 jquery 显示自定义样式输入文件的文件名
- powershell - Powershell管道列表的第一行
- jwt - Identity Server 4 刷新令牌过期不起作用
- excel - 比较和突出第 18 列和第 20 列
- python - 如何找到两个numpy数组的多少列相互匹配?
- jquery - 将 keyup(Enter) 和 blur 事件组合在一起
- c# - 通过保持文本格式将所有文本从 RichTextBox1 克隆到 RichTextBox2