首页 > 解决方案 > Mutate with apply statement in dplyr chain

问题描述

Let's say I want to get a count of the variables that are greater than 5 in the iris dataset for each row:

iris %>%
  mutate(n_greater=apply(.,1,function(x) length(x[(x>5)])),
         n_less=apply(.,1,function(x) length(x[(x<5)])))  ##Is incorrectly counting factor column

iris %>% select_if(is.numeric) %>%
  mutate(n_greater=apply(.,1,function(x) length(x[(x>5)])),
         n_less=apply(.,1,function(x) length(x[(x<5)]))) ##Correct, but I need the non-numeric columns (species) in the end

iris %>%
  mutate_if(is.numeric,n_greater=apply(.,1,function(x) length(x[(x>5)]))) ##Does not work

Note, I have also tried incorporating which logic into the custom function call to no success. I can do this, but would like a dplyr solution.

n_greater=apply(iris[,-1],1,function(x) length(x[(x>5)]))
n_less=apply(iris[,-1],1,function(x) length(x[(x<5)]))
final_iris=cbind(iris,n_greater,n_less) #This is it

标签: r

解决方案


We can use rowSums on a logical vector. In the OP's solution, the logical condition is also applied on non-numeric columns as well. Here, it is removing the non-numeric columns with select_if

library(dplyr)
iris %>%
     mutate(n_greater = rowSums(select_if(., is.numeric) > 5),
            n_less = (ncol(.)-1) - n_greater )

It may be also better to either create a column with the number of numeric columns or get the names of the columns that are numeric first

nm1 <- iris %>%
            select_if(is.numeric) %>% 
            names
iris %>%
      mutate(n_greater = rowSums(select(., nm1) > 5),
             n_less = length(nm1) - n_greater)

Or another option is to do + with reduce after creating columns with map and then bind the columns with the original dataset

library(purrr)
iris %>%
    select_if(is.numeric) %>%
    transmute(n_greater = map(., `>`, 5) %>% 
                reduce(`+`), n_less = ncol(.) - n_greater) %>% 
    bind_cols(iris, .)

Or instead of the map, we can use mutate_all

iris %>%
   select_if(is.numeric) %>%
   mutate_all(~ . > 5) %>%
   transmute(n_greater = reduce(., `+`), 
            n_less = ncol(.) - n_greater) %>%
   bind_cols(iris, .)

Just in case, a base R option is

iris$n_greater <- rowSums(iris[nm1] > 5)
iris$n_less <- length(nm1) - iris$n_greater

推荐阅读