首页 > 解决方案 > 编写一个转换变量模式和类的函数

问题描述

我使用以下代码 ( LINK ) 为假设的 df 清除数据的潜在麻烦方面,称为dataframe

dataframe <- fread(
    "A   B  B.x  C  D   E   iso   year   
     0   3   NA  1  NA  NA  NLD   2009   
     1   4   NA  2  NA  NA  NLD   2009   
     0   5   NA  3  NA  NA  AUS   2011   
     1   5   NA  4  NA  NA  AUS   2011   
     0   0   NA  7  NA  NA  NLD   2008   
     1   1   NA  1  NA  NA  NLD   2008   
     0   1   NA  3  NA  NA  AUS   2012   
     0   NA  1   NA  1  NA  ECU   2009   
     1   NA  0   NA  2  0   ECU   2009   
     0   NA  0   NA  3  0   BRA   2011   
     1   NA  0   NA  4  0   BRA   2011   
     0   NA  1   NA  7  NA  ECU   2008   
     1   NA  0   NA  1  0   ECU   2008   
     0   NA  0   NA  3  2   BRA   2012   
     1   NA  0   NA  4  NA  BRA   2012",
   header = TRUE
)

dataframe <- as.data.frame(dataframe)
## get mode of all vars
var_mode <- sapply(dataframe, mode)
## produce error if complex or raw is found
if (any(var_mode %in% c("complex", "raw"))) stop("complex or raw not allowed!")
## get class of all vars
var_class <- sapply(dataframe, class)
## produce error if an "AsIs" object has "logical" or "character" mode
if (any(var_mode[var_class == "AsIs"] %in% c("logical", "character"))) {
  stop("matrix variables with 'AsIs' class must be 'numeric'")
  }
## identify columns that needs be coerced to factors
ind1 <- which(var_mode %in% c("logical", "character"))
## coerce logical / character to factor with `as.factor`
dataframe[ind1] <- lapply(dataframe[ind1], as.factor)

由于我经常使用它,但我更愿意将它放在一个函数中并尝试以下操作:

cleanfunction <- function(dataframe) {
dataframe <- as.data.frame(dataframe)
## get mode of all vars
var_mode <- sapply(dataframe, mode)
## produce error if complex or raw is found
if (any(var_mode %in% c("complex", "raw"))) stop("complex or raw not allowed!")
## get class of all vars
var_class <- sapply(dataframe, class)
## produce error if an "AsIs" object has "logical" or "character" mode
if (any(var_mode[var_class == "AsIs"] %in% c("logical", "character"))) {
  stop("matrix variables with 'AsIs' class must be 'numeric'")
  }
## identify columns that needs be coerced to factors
ind1 <- which(var_mode %in% c("logical", "character"))
## coerce logical / character to factor with `as.factor`
dataframe[ind1] <- lapply(dataframe[ind1], as.factor)
}

dfclean <- cleanfunction(dataframe)

然而,这创建了一个转换为因子的变量列表,而不是一个将这些变量转换为因子的数据框。

我该如何解决这个问题?

标签: rfunctionclassdata-cleaning

解决方案


函数从最后一个计算的表达式返回值。在这种情况下,评估的最后一个表达式是

dataframe[ind1] <- lapply(dataframe[ind1], as.factor)

并且<-操作总是只返回右手边的值。所以你只是从 中返回结果lapply,而不是更新后的dataframe.

你只需要添加另一行说

return(dataframe)

要不就

dataframe

到你的功能结束。


推荐阅读