首页 > 解决方案 > 如何将函数应用于特定列

问题描述

我有一个示例数据框,如下所示:

well <- c('A1','A2','A3','A4','A5')
area <- c(21000, 23400, 26800,70000,8000)
length <- c(21, 234, 26,70,22)
group<-c('WT','Control','C2','D2','E1')

data <- data.frame(well,area,length,group)

我想应用下面的函数从数据框中删除带有异常值的行:

Outlier <- function(x){
  low <- median(x, na.rm=TRUE)-5*(mad(x)) 
  high <- median(x, na.rm=TRUE)+5*(mad(x))   
  out <- if_else(x > high, NA,ifelse(x < low, low, x)) 
  out }

如何将此函数应用于不包括某些列的数据框,例如列“well”和“group”?

标签: rfunctiondataframeapplyoutliers

解决方案


我们可以用lapplybase R

data[c('area', 'length')] <- lapply(data[c('area', 'length')], Outlier)

或与dplyr

library(dplyr) # 1.0.0
data %>% 
     mutate(across(area:length, Outlier))
#    well  area length   group
#1   A1 21000     21      WT
#2   A2 23400     NA Control
#3   A3 26800     26      C2
#4   A4    NA     NA      D2
#5   A5  8000     22      E1

注意:确保NANA_real_“异常值”函数中的

Outlier <- function(x){
  low <- median(x, na.rm=TRUE)-5*(mad(x)) 
  high <- median(x, na.rm=TRUE)+5*(mad(x))   
  out <- if_else(x > high, NA_real_,ifelse(x < low, low, x)) 
  out }

推荐阅读