首页 > 解决方案 > 我自己创建的函数:RemoveNA 运行一半

问题描述

这是我的代码:

    PATH <- 
 "https://raw.githubusercontent.com/thomaspernet/data_csv_r/master/data/titanic_csv.csv"

df_titanic <- read.csv(PATH, sep = ",")

RemoveNA = 
function(x)
{
  colmiss = colnames(x)[apply(x,2,anyNA)]
  colmiss
  i = 1
  while ( i <= length(colmiss))
  {
   col_na_col  = match(colsmiss[i],names(x))
   col_na_col 
   for (n in col_na_col)
   {
    #column_name = colsmiss[i]
    cat('  Your missing column is: ' ,'"',colsmiss[i],'"','  and col.no is : ',n, '||||')
    # Create mean
    average_missing <- mean(x[,colsmiss[i]],na.rm =TRUE)
    average_missing
    x[n][is.na(x[n])] = average_missing
   }
   i = i + 1
  }
} 

sum(is.na(df_titanic))
RemoveNA(df_titanic)

当我运行函数RemoveNA时,它给出:您缺少的列是: “年龄”并且 col.no 是:6 |||| 您缺少的列是:“票价”和 col.no 是:10 |||| 没关系,但是下面的替换没有正确完成,因为 sum(is.na(df_titanic)) 之前和之后的总和为 264

标签: r

解决方案


这是一个更直接的方法:

df1 <- data.frame(a= c(NA,1,NA,2), b = 1:4)
df1[] <- lapply(df1, function(x) replace(x,is.na(x),mean(x,na.rm=TRUE)))
df1
#     a b
# 1 1.5 1
# 2 1.0 2
# 3 1.5 3
# 4 2.0 4

您的代码有一个类型,您键入colsmiss而不是colmiss.

此外,您的代码不会返回任何内容(它返回 I 的最后一个值),因此您对 NA 值的转换不会记录在任何地方。

您更正的功能:

RemoveNA = function(x)
  {
    colmiss = colnames(x)[apply(x,2,anyNA)]
    colmiss
    i = 1
    while ( i <= length(colmiss))
    {
      col_na_col  = match(colmiss[i],names(x))
      col_na_col 
      for (n in col_na_col)
      {
        #column_name = colsmiss[i]
        cat('  Your missing column is: ' ,'"',colmiss[i],'"','  and col.no is : ',n, '||||')
        # Create mean
        average_missing <- mean(x[,colmiss[i]],na.rm =TRUE)
        average_missing
        x[n][is.na(x[n])] = average_missing
      }
      i = i + 1
    }
    x
  } 

推荐阅读