首页 > 解决方案 > 如何合并名称输入错误的行并将它们各自的值相加?

问题描述

有问题的df:

structure(list(names = c("species_1", "species_2", "species_3", 
"species1", "species3"), total = c(5, 3, 2, 2, 3)), row.names = c(NA, 
-5L), class = c("tbl_df", "tbl", "data.frame"))

预期的df:

structure(list(names = c("species_1", "species_2", "species_3"
), total = c(7, 3, 5)), row.names = c(NA, -3L), class = c("tbl_df", 
"tbl", "data.frame"))

我尝试按名称过滤,然后总结和总结,但没有成功

标签: rdplyr

解决方案


我们可能需要通过插入 a 来更改 'names'_sum通过与 'names' 分组来获取

library(dplyr)
library(stringr)
df %>%
     group_by(names = str_replace(names, "([A-Za-z]+)(\\d+)", "\\1_\\2")) %>%
     summarise(total = sum(total))

-输出

# A tibble: 3 x 2
  names     total
  <chr>     <dbl>
1 species_1     7
2 species_2     3
3 species_3     5

推荐阅读