首页 > 解决方案 > R:在数据框中按组替换数据

问题描述

我有一个这种风格的数据集:

id1  id2  start_line end_line content   
A    B    1          1        "aaaa" 
A    B    4          4        "aa mm" 
A    B    5          5        "boool"
A    B    6          6        "omw"   
C    D    6          6        "hear!" 
C    D    7          7        " me out!"
C    D    21         21       "hello"   

我需要根据特定的标准对其进行多次变异。特别是具有相同id1、相同id2和连续的行start_line

所以,预期的结果是:

id1  id2  start_line end_line content      real_line   cid
A    B    1          1        "aaaa"        1          1
A    B    4          6        "aa mm"       4          2
A    B    4          6        "boool"       5          2
A    B    4          6        "omw"         6          2
C    D    6          7        "hear!"       6          3
C    D    6          7        " me out!"    7          3
C    D    21         21       "hello"       21         4

我可以real_line通过简单地复制原始列来添加,但我不知道如何替换start_line并且end_line没有总结。

标签: rdataframetidyversedplyr

解决方案


我们按 'id1'、'id2' 分组,然后根据

library(dplyr)
df %>% 
     group_by(id1, id2) %>% 
     group_by(grp = cumsum(c(TRUE, diff(start_line)  != 1)), 
           .add = TRUE) %>% 
    mutate(real_line = start_line, 
       start_line = first(start_line), end_line = last(end_line)) %>%
    mutate(cid = cur_group_id()) %>%
    ungroup %>%
    select(-grp)

-输出

# A tibble: 7 x 7
#  id1   id2   start_line end_line content      cid real_line
#  <chr> <chr>      <int>    <int> <chr>      <int>     <int>
#1 A     B              1        1 "aaaa"         1         1
#2 A     B              4        6 "aa mm"        2         4
#3 A     B              4        6 "boool"        2         5
#4 A     B              4        6 "omw"          2         6
#5 C     D              6        7 "hear!"        3         6
#6 C     D              6        7 " me out!"     3         7
#7 C     D             21       21 "hello"        4        21

推荐阅读