首页 > 解决方案 > 我的数据框中具有连续数据类型的属性的等宽离散化和分类

问题描述

我的数据框中的一个属性具有连续数据类型 (aggregatedInocme),我想根据 (aggregatedInocme) 属性中的值创建一个具有 (Low, Mid, high) 类别的新属性。我已将分类分为三个范围,如下面的代码所示

我使用 for 循环和 if 语句制作了一个简单的代码来检查属性中每个单元格的值是否属于特定范围,然后将相应的字符串分配给它

y<-min(data_loanapp$aggregatedInocme)-0
x<-max(data_loanapp$aggregatedInocme)-min(data_loanapp$aggregatedInocme)
c1<-(y+(x/3))
c2<- (y+((2*x)/3))
rr <- c()
 for (val in data_loanapp$aggregatedInocme){
   if(val<= c1) {
      rr[val]<- append(rr[val], 'Low')
     }else if (c1< val<= c2){
      rr[val]<-append(rr[val], "mid")
     }else
      rr[val]<-append(rr[val], "high")
}

rr

我期望具有(低,高,中)任一值的属性。但我不断收到一个带有所有 NA 和错误警告消息的属性: In rr[val] <- append(rr[val], "high") : 要替换的项目数不是替换长度的倍数

错误:“}”中出现意外的“}”

标签: rdiscretization

解决方案


我想到了:

#this was used only to find the bins width
library(classInt)
classIntervals(data_loanapp$aggregatedInocme, 3)
data_loanapp$Cat_AggInc<- classIntervals(data_loanapp$aggregatedInocme, 3, 
style 
= 'equal')
#here i defined and created the categores 
data_loanapp$Income_Cat<-c( "low", "medium", "high")[
               findInterval(data_loanapp$aggregatedInocme, c(1442,4583, 6588, 81000))]
data_loanapp$Income_Cat

推荐阅读