首页 > 解决方案 > 如何根据 dplyr 中另一组的滤液提取一组的信息

问题描述

我的数据框看起来像这样,但有数千个条目

type <- rep(c("A","B","C"),4)
time <- c(0,0,0,1,1,1,2,2,2,3,3,3)
counts <- c(0,30,15,30,30,10,31,30,8,30,8,0)
df <- data.frame(time,type,counts)
df 

  time type counts
1     0    A      0
2     0    B     30
3     0    C     15
4     1    A     30
5     1    B     30
6     1    C     10
7     2    A     31
8     2    B     30
9     2    C      8
10    3    A     30
11    3    B      8
12    3    C      0

我想在每个大于 0 的时间点提取所有具有 counts==30 的类型,然后我想在下一个时间点为这些类型提取它们的计数。

我希望我的数据看起来像这样

time type counts time_after  type_after  counts_after
 1    A     30      2            A           30
 1    B     30      2            B           31
 2    B     30      3            B            8

任何帮助或指导表示赞赏

标签: rdplyrtidyversetidy

解决方案


不是很优雅,但应该做的工作

library(dplyr)

type <- rep(c("A","B","C"),4)
time <- c(0,0,0,1,1,1,2,2,2,3,3,3)
counts <- c(0,30,15,30,30,10,31,30,8,30,8,0)
df <- tibble(time,type,counts)
df 
#> # A tibble: 12 x 3
#>     time type  counts
#>    <dbl> <chr>  <dbl>
#>  1     0 A          0
#>  2     0 B         30
#>  3     0 C         15
#>  4     1 A         30
#>  5     1 B         30
#>  6     1 C         10
#>  7     2 A         31
#>  8     2 B         30
#>  9     2 C          8
#> 10     3 A         30
#> 11     3 B          8
#> 12     3 C          0

thirties <- df %>% 
  filter(counts == 30 & time != 0) %>% 
  mutate(time_after = time + 1)

inner_join(thirties, df, by = c("time_after" = "time",
                                "type" = "type")) %>%
  select(time,
         type = type,
         counts = counts.x,
         time_after,
         type_after = type,
         count_after = counts.y)
#> # A tibble: 3 x 6
#>    time type  counts time_after type_after count_after
#>   <dbl> <chr>  <dbl>      <dbl> <chr>            <dbl>
#> 1     1 A         30          2 A                   31
#> 2     1 B         30          2 B                   30
#> 3     2 B         30          3 B                    8

推荐阅读