首页 > 解决方案 > 如何测试数据框的每个值并填充 R 中的特定列?

问题描述

我有一个这样的数据框:

df <- data.frame(Class = c('A', 'B', 'C'),
                 V1 = c('21, 23', NA, '50, 100'),
                 V2 = c(NA, NA, '13'),
                 V3 = c(NA, '152', '18, 182'))
df[, c(2:4)] <- as.character(df[, c(2:4)])
str(df)

我将变量 V1、V2 和 V3 设置为字符:

df[, c(2:4)] <- as.character(df[, c(2:4)])

我想测试每个变量以计算低于 80、介于 80 和 110 之间以及高于 110 的值的出现次数。之后,将这些计数保存为没有新变量。它应该返回类似的东西:

df <- data.frame(Class = c('A', 'B', 'C'),
                 V1 = c('21, 23', NA, '50, 100'),
                 V2 = c(NA, NA, '13'),
                 V3 = c(NA, '152', '18, 182'), 
                 BELOW = c(2, 0, 3),
                 BETWEEN = c(0, 0, 1),
                 ABOVE = c(0, 1, 1))

我怎么能那样做?

标签: rdataframeiteration

解决方案


假设您的数字始终用逗号分隔,则此代码可以满足您的要求:

df[, c(2:4)] <- lapply(df[,2:4], as.character)
newDF <- cbind(df[,2:4],t(apply(df[,2:4],1,function(row)
  {
  row.vec <- na.omit(unlist(row))
  l.Num <- unlist(lapply(strsplit(row.vec,",\\s?"),as.numeric))
  below <- length(which(l.Num < 80))
  between <- length(which(l.Num >= 80 & l.Num < 110))
  above <- length(which(l.Num > 110))
  return(c(BELOW=below,BETWEEN=between,ABOVE=above))
})))

推荐阅读