首页 > 解决方案 > Subset dataframe based on indicator variables

问题描述

Using R, how can one subset a dataframe that has indicator variables, based on a vector of columns?

# Dataframe with 3 indicator variables - a, b, and c
df = data.frame(a = c(1, 0), b = c(1, 1), c = c(0, 1))

subset.iv = function (df, cols) {
    # ???
}

# Subset rows that match a or c (i.e. a=1 or c=1):
subset.iv(df, c('a', 'c'))

# Subset rows that match b (i.e. b=1):
subset.iv(df, c('b'))

I know how to subset a dataframe based on a known/static condition (e.g. df[df$a == 1 | df$b == 1,]).

But in this case the problem is that I can't write the condition expression since I don't know the number of columns to check for, or the columns themselves.

Also, subset doesn't allow passing a custom function where I might be able to parse the vector and check for columns.

标签: r

解决方案


假设你的指标是肯定的,零是否定的,那么这样的事情可能会奏效

subset.iv = function (df, cols) {
  df[rowSums(df[cols])>0, ]
  }

给予

> subset.iv(df, c('a', 'c'))
  a b c
1 1 1 0
2 0 1 1
> subset.iv(df, c('b'))
  a b c
1 1 1 0
2 0 1 1
> subset.iv(df, c('c'))
  a b c
2 0 1 1

推荐阅读