首页 > 技术文章 > weka--数据离散化

zuimeiyujianni 2018-04-27 16:54 原文

NAME
weka.filters.unsupervised.attribute.Discretize

SYNOPSIS
An instance filter that discretizes a range of numeric attributes in the dataset into nominal attributes. Discretization is by simple binning. Skips the class attribute if set.

OPTIONS
spreadAttributeWeight -- When generating binary attributes, spread weight of old attribute across new attributes. Do not give each new attribute the old weight.

makeBinary -- Make resulting attributes binary.

debug -- If set to true, filter may output additional info to the console.

bins -- Number of bins.

doNotCheckCapabilities -- If set, the filter's capabilities are not checked before it is built. (Use with caution to reduce runtime.)

findNumBins -- Optimize number of equal-width bins using leave-one-out. Doesn't work for equal-frequency binning

attributeIndices -- Specify range of attributes to act on. This is a comma separated list of attribute indices, with "first" and "last" valid values. Specify an inclusive range with "-". E.g: "first-3,5,6-10,last".

desiredWeightOfInstancesPerInterval -- Sets the desired weight of instances per interval for equal-frequency binning.

useBinNumbers -- Use bin numbers (eg BXofY) rather than ranges for for discretized attributes

useEqualFrequency -- If set to true, equal-frequency binning will be used instead of equal-width binning.

binRangePrecision -- The number of decimal places for cut points to use when generating bin labels

invertSelection -- Set attribute selection mode. If false, only selected (numeric) attributes in the range will be discretized; if true, only non-selected attributes will be discretized.

ignoreClass -- The class index will be unset temporarily before the filter is applied.

名字
weka.filters.unsupervised.attribute.Discretize
提要
一个实例过滤器,它将数据集中的一系列数字属性转换为名义属性。离散化通过简单的结合进行。如果设置,跳过分类属性。
选项
扩展属性——当生成二进制属性时,将旧属性的权重传递到新属性中。不要给每一个新的属性一个旧的重量。
makeBinary——生成属性二进制。
调试——如果设置为真,过滤器可以输出附加信息到控制台。
段——段的数量。
donotcheck功能——如果设置,过滤器的功能在构建之前不会被检查。(小心使用,以减少运行时。)
findNumBins——使用leave-one-out优化等宽容器的数量。不适用于等频宾宁。
属性索引——指定要执行的属性的范围。这是一个逗号分隔的属性索引列表,带有“first”和“last”有效值。指定包含“-”的范围。E。旅客:“前3、5、6 - 10,去年”。
desiredWeightOfInstancesPerInterval——在每个间隔中为等频率绑定设置所需的实例权重。
useBinNumbers——使用bin编号(如BXofY),而不是用于离散属性的范围。
等频离散化——如果设置为真,则将使用等频binning而不是等宽binning。
binRangePrecision——在生成bin标签时用于切割点的小数位数。
反向选择——设置属性选择模式。如果是false,则在范围内仅选择(数值)属性将被离散;如果是真,则只有非选择的属性将被离散化。
ignoreClass——在应用筛选器之前,类索引将被暂时取消。

推荐阅读