首页 > 解决方案 > 添加一个新列,其中该行为 True 或其他特定值的列的名称

问题描述

我有这个数据框:

Name    Pr  VP  Tr  Me  Sa  Ar
Alicia  1   0   0   0   1   0
Bonnie  0   1   1   0   0   0
Cathy   1   1   1   1   1   1
Daphne  1   0   0   0   0   1
Elena   0   0   0   1   1   1
Faye    0   0   0   0   0   1

我想制作这个数据框,它添加一列,每列的名称为每行 1:

Name    Pr  VP  Tr  Me  Sa  Ar  Nominations
Alicia  1   0   0   0   1   0   Pr, Ar
Bonnie  0   1   1   0   0   0   VP, Tr
Cathy   1   1   1   1   1   1   Pr, VP, Tr, Me, Sa, Ar
Daphne  1   0   0   0   0   1   Pr, Ar
Elena   0   0   0   1   1   1   Me, Sa, Ar
Faye    0   0   0   0   0   1   Ar

我更喜欢 tidyverse,但了解基础 R 也会很有用。

标签: rdplyr

解决方案


apply我们可以用andMARGIN = 1和'names'遍历行,paste其中'x'是1

df1$Nominations <- apply(df1[-1], 1, function(x) toString(names(x)[x == 1]))
df1$Nominations
#[1] "Pr, Sa"                 "VP, Tr"                 "Pr, VP, Tr, Me, Sa, Ar" "Pr, Ar"                
#[5] "Me, Sa, Ar"             "Ar"   

或者使用tidyverse, reshape to 'long' format with pivot_longer, 按 'Name' 分组,通过summarise' pastename' 'value' 为 1 并加入原始数据集

library(dplyr)
library(tidyr)
df1 %>% 
   pivot_longer(cols = -Name) %>% 
   group_by(Name) %>% 
   summarise(Nominations = toString(name[as.logical(value)])) %>% 
   right_join(df1) %>%
   select(names(df1), everything())
# A tibble: 6 x 8
#  Name      Pr    VP    Tr    Me    Sa    Ar Nominations           
#  <chr>  <int> <int> <int> <int> <int> <int> <chr>                 
#1 Alicia     1     0     0     0     1     0 Pr, Sa                
#2 Bonnie     0     1     1     0     0     0 VP, Tr                
#3 Cathy      1     1     1     1     1     1 Pr, VP, Tr, Me, Sa, Ar
#4 Daphne     1     0     0     0     0     1 Pr, Ar                
#5 Elena      0     0     0     1     1     1 Me, Sa, Ar            
#6 Faye       0     0     0     0     0     1 Ar       

数据

df1 <- structure(list(Name = c("Alicia", "Bonnie", "Cathy", "Daphne", 
"Elena", "Faye"), Pr = c(1L, 0L, 1L, 1L, 0L, 0L), VP = c(0L, 
1L, 1L, 0L, 0L, 0L), Tr = c(0L, 1L, 1L, 0L, 0L, 0L), Me = c(0L, 
0L, 1L, 0L, 1L, 0L), Sa = c(1L, 0L, 1L, 0L, 1L, 0L), Ar = c(0L, 
0L, 1L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
-6L))

推荐阅读