r - 从另一列值设置类别
问题描述
我想创建一个可能有 3 个值的内兹列类别,低、中和高。这些值将取决于另一列。我在下面尝试了这个,但它只适用于中和高。低不采取。
admission$category[admission$gre == 0 | admission$gre <= 440]= "low"
admission$category[admission$gre == 440 | admission$gre <= 580] = "Medium"
admission$category[admission$gre == 580 | admission$gre >= 580] = "High"
admission$category=as.factor(admission$category)
错误:
admission$category[admission$gre == 0 | admission$gre <= 440]= "low"
警告信息:
在[<-.factor
(*tmp*
, admission$gre == 0 | admission$gre <= 440, : 无效因子级别,生成 NAstr du df类别:因子w / 2个级别“高”,“中”:2 1 1 1 2 1 2 2 2 1 ...
解决方案
您有错误,因为类别是一个因素。
set.seed(100)
admission = data.frame(category=sample(letters[1:4],100,replace=TRUE),
gre = sample(1:600,100))
admission$category = as.character(admission$category)
admission$category[admission$gre <= 440]= "low"
admission$category[admission$gre > 440 & admission$gre <= 580] = "Medium"
admission$category[admission$gre > 580] = "High"
table(admission$category)
High low Medium
3 69 28
或者您可以简单地执行以下操作:
admission$category = cut(admission$gre,breaks=c(0,440,580,+Inf),
labels=c("low","Medium","High"))
table(admission$category)
low Medium High
69 28 3