首页 > 解决方案 > subsetting a vector/data frame in R yields different results

问题描述

Recently, I was asked about subsetting a data frame in R. My colleague had this line of code

dd2 <- subset(dd, tret == c("T1", "T2", "T3", "T4")) which yields 1/4 of the subset. In contrast to the standard dd2 <- subset(dd, tret == "T1" | tret == "T2" | tret == "T3" | tret == "T4") which yields 960 rows, the first line of code only yields 240 rows.

Same thing happens to vectors. For instance,

x <- c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4)
y <- x[x == 1 | x == 2] gives a vector different from 
y <- x[x == c(1,2)] 

Any insight on the differences? Thank you.

标签: rsubset

解决方案


vector问题在于当我们使用length大于 1 的 a 而另一个大于 1的值时,值的回收length

x == 1:2
#[1]  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

在哪里

x
#[1] 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4

并且比较按以下方式进行

rep(1:2, length.out = length(x))
#[1] 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2

在上面的示例中,1 与 x 的第一个元素进行比较,2 与第二个元素进行比较,1 再次与 x 的第三个元素进行比较,2 与第 4 个进行比较,并且重复直到向量“x”的末尾。要比较长度 > 1 的向量,请使用%in%

identical(x[x == 1 | x == 2], x[x %in% 1:2])
#[1] TRUE

推荐阅读