首页 > 解决方案 > How to keep columns after `summarise` operation in `dplyr`

问题描述

I have this type of data:

df <- data.frame(name = c("Acer laurinum", "Acer laurinum Hassk.", "Acmella paniculata", 
                          "Adinandra cf. integerrima", "Adinandra cf. integerrima T.Anderson"),
                 value1 = c(1,2,3,4,5),
                 value2 = c(2,3,4,5,6))

I want to summarise columns value1 and value2 based on the matched parts of column nameand keep the unique values of the new column author. This code only does the summarising part but author is gone:

df %>%
  mutate(author = str_extract(name, "(?<=\\s)(?=.*\\.)[.\\w]+$"),
         name1 = trimws(str_remove(name, "(?<=\\s)(?=.*\\.)[.\\w]+$"))) %>%
  group_by(name1) %>%
  summarise(across(c(value1, value2), sum))

# A tibble: 3 x 3
  name1                     value1 value2
* <chr>                      <dbl>  <dbl>
1 Acer laurinum                  3      5
2 Acmella paniculata             3      4
3 Adinandra cf. integerrima      9     11

Expected output:

# A tibble: 3 x 3
  name1                     value1 value2      author
* <chr>                      <dbl>  <dbl>       <chr>
1 Acer laurinum                  3      5       Hassk.
2 Acmella paniculata             3      4        <NA>
3 Adinandra cf. integerrima      9     11  T.Anderson

标签: rdplyr

解决方案


You may use na.omit(author)[1] to get 1st non NA value of author in the group.

library(dplyr)
library(stringr)

df %>%
  mutate(author = str_extract(name, "(?<=\\s)(?=.*\\.)[.\\w]+$"),
         name1 = trimws(str_remove(name, "(?<=\\s)(?=.*\\.)[.\\w]+$"))) %>%
  group_by(name1) %>%
  summarise(across(c(value1, value2), sum), 
            author = na.omit(author)[1])

#  name1                     value1 value2 author    
#  <chr>                      <dbl>  <dbl> <chr>     
#1 Acer laurinum                  3      5 Hassk.    
#2 Acmella paniculata             3      4 NA        
#3 Adinandra cf. integerrima      9     11 T.Anderson

推荐阅读