首页 > 解决方案 > R中的“标记”重复项

问题描述

我正在使用 R 编程语言。假设我有以下数据:

 Data_I_Have <- data.frame(
        
        "Person" = c("John", "John", "John", "Peter", "Peter", "Peter", "Tim", "Kevin", "Adam", "Adam", "Xavier"),
        "Number_of_Kids" = c("4", "1", "1", "5", "2", "3", "7", "0", "3", "3", "5")
     
    )

  Person Number_of_Kids
1    John              4
2    John              1
3    John              1
4   Peter              5
5   Peter              2
6   Peter              3
7     Tim              7
8   Kevin              0
9    Adam              3
10   Adam              3
11 Xavier              5

是否可以“标记”每个重复的名称,使其看起来像下面的文件(例如 John_1、John_2 等)?

Data_I_Want <- data.frame(
    
    "Person" = c("John_1", "John_2", "John_3", "Peter_1", "Peter_2", "Peter_3", "Tim", "Kevin", "Adam_1", "Adam_2", "Xavier"),
   "Number_of_Kids" = c("4", "1", "1", "5", "2", "3", "7", "0", "3", "3", "5")
 
)

   Person Number_of_Kids
1   John_1              4
2   John_2              1
3   John_3              1
4  Peter_1              5
5  Peter_2              2
6  Peter_3              3
7      Tim              7
8    Kevin              0
9   Adam_1              3
10  Adam_2              3
11  Xavier              5

使用这个先前的问题Add specific characters to duplicated strings,我尝试按照那里使用的方法:

Data_I_Want <-  make.unique(Data_I_Have, sep = '_')

但这给了我以下错误:

Error in make.unique(Data_I_Have, sep = "_") : 
  'names' must be a character vector

有人可以告诉我如何解决这个问题吗?

谢谢!

标签: rduplicatesdata-manipulation

解决方案


make.unique需要一个向量而不是 data.frame 并且默认情况下输出将附加 1、2、3 与.sep仅来自重复值而不是从一开始。即

> make.unique(Data_I_Have$Person)
 [1] "John"    "John.1"  "John.2"  "Peter"   "Peter.1" "Peter.2" "Tim"     "Kevin"   "Adam"    "Adam.1"  "Xavier" 

如果我们想获得所需的输出,请按“Person”row_number()分组,然后将 与 group 列连接ungroup()起来。

library(dplyr)
library(stringr)
Data_I_Have %>%
    group_by(Person) %>% 
    mutate(Person = case_when(n() > 1 ~
        str_c(Person, "_", row_number()), TRUE ~ Person)) %>% 
    ungroup()

-输出

# A tibble: 11 x 2
   Person  Number_of_Kids
   <chr>   <chr>         
 1 John_1  4             
 2 John_2  1             
 3 John_3  1             
 4 Peter_1 5             
 5 Peter_2 2             
 6 Peter_3 3             
 7 Tim     7             
 8 Kevin   0             
 9 Adam_1  3             
10 Adam_2  3             
11 Xavier  5     

推荐阅读