首页 > 解决方案 > 在另一列的值唯一后连接一列中的值

问题描述

我有一个看起来像这样的小标题:

library(tidyverse)

df <- tibble(table_name = c("horse", "x", "x", "x", "dog", "x", "rat", "x", "x", "x", "x", "x"),
             value_str = c(NA, "a", "b", "c", NA, "a", NA, "b", "d", "e", "f", "g"))
    > df
    # A tibble: 12 x 2
        table_name value_str
         <chr>      <chr>    
    1 horse      <NA>     
    2 x          a        
    3 x          b        
    4 x          c        
    5 dog        <NA>     
    6 x          a        
    7 rat        <NA>     
    8 x          b        
    9 x          d        
   10 x          e        
   11 x          f        
   12 x          g 

我想为“horse”、“dog”和“rat”生成单独的向量,其中包含从value_str“horse”下面的行到“dog”、“dog”到“rat”和“rat”直到结尾。我希望输出看起来像以下向量:

vec_horse <- tibble(horse = c("a", "b", "c")) %>% pull(., horse)
vec_dog <- tibble(dog = c("a")) %>% pull(., dog)
vec_rat <- tibble(rat = c("b", "d", "e", "f", "g")) %>% pull(., rat)
    > vec_horse
    [1] "a" "b" "c"

我会为 做group_by()df$table_name但在这种情况下它不起作用,因为它与 的值的位置有关df$value_str

我不能折叠df$value_str成单个向量,因为输出需要是 . 中每个唯一类别的单独向量df$table_name

提前致谢!

标签: rdplyrtibble

解决方案


d = df %>%
    mutate(table_name = if_else(table_name == "x", NA_character_, table_name)) %>%
    fill(table_name) %>%
    group_by(table_name) %>%
    summarise(value_str = list(unique(value_str[!is.na(value_str)]))) %>%
    ungroup()

setNames(d$value_str, d$table_name)

推荐阅读