首页 > 解决方案 > 向量中的条件替换

问题描述

如果至少有两个大于 4x的值介于 2 和 3 之间,我正在尝试用向量中的 NA 替换所有值(按组)。在此示例中,在 groupa中,有 2 个大于 4 的值2 <= x <= 3

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

tibble(
  grp = c("a", "a", "a", "b", "b", "b"),
  x = c(1, 2, 3, 1, 2, 3),
  val = c(4, 5, 6, 1, 2, 1)
) %>%
  group_by(grp) %>%
  mutate(val2 = ifelse(sum(val[between(x, 2, 3)] > 4) >= 2, NA, val))
#> # A tibble: 6 × 4
#> # Groups:   grp [2]
#>   grp       x   val  val2
#>   <chr> <dbl> <dbl> <dbl>
#> 1 a         1     4    NA
#> 2 a         2     5    NA
#> 3 a         3     6    NA
#> 4 b         1     1     1
#> 5 b         2     2     1
#> 6 b         3     1     1

预期产出

tibble(
  grp = c("a", "a", "a", "b", "b", "b"),
  x = c(1, 2, 3, 1, 2, 3),
  val = c(4, 5, 6, 1, 2, 1),
  val2 = c(NA, NA, NA, 1, 2, 1)
)
#> # A tibble: 6 × 4
#>   grp       x   val  val2
#>   <chr> <dbl> <dbl> <dbl>
#> 1 a         1     4    NA
#> 2 a         2     5    NA
#> 3 a         3     6    NA
#> 4 b         1     1     1
#> 5 b         2     2     2
#> 6 b         3     1     1

reprex 包于 2021-10-25 创建(v2.0.1)

标签: rdplyr

解决方案


问题是ifelse返回一个长度等于第一个参数的向量。由于sum(val[between(x, 2, 3)] > 4) >= 2返回长度为 1 的逻辑向量,因此仅val返回第一个,然后将其循环到全长。例如ifelse(TRUE, 1:3, 11:13)只会返回1. 您可以使用rep该值重复整个长度

mutate(val2 = ifelse(rep(sum(val[between(x, 2, 3)] > 4) >= 2, n()), NA, val))

或使用标准 if/else 语句

mutate(val2 = if(sum(val[between(x, 2, 3)] > 4) >= 2) NA else val)

推荐阅读