首页 > 解决方案 > 将多列压缩为具有值的单列

问题描述

我正在尝试根据哪一列具有特定值将多个数据压缩成一列。这是我如何手动完成的示例。我现在需要为更大的一组列执行此操作。有没有办法更智能地做到这一点?如果可能的话,我更喜欢一个整洁的答案。

data <- structure(list(
  `Race: (choice=Asian)` = c("Unchecked", "Unchecked", 
    "Unchecked", "Unchecked", "Unchecked", "Unchecked"),
  `Race: (choice=Black)` = c("Unchecked", 
    "Unchecked", "Unchecked", "Unchecked", "Checked", "Checked"), 
  `Race: (choice=White)` = c("Checked", "Checked", "Checked", 
    "Unchecked", "Unchecked", "Unchecked"),
  `Race: (choice=Other)` = c("Unchecked", 
    "Unchecked", "Unchecked", "Checked", "Unchecked", "Unchecked"
), ID = 1:6), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

data$race=NA_character_
data[data$`Race: (choice=Other)`=="Checked",]$race="Other"
data[data$`Race: (choice=White)`=="Checked",]$race="White"
data[data$`Race: (choice=Black)`=="Checked",]$race="Black"
data[data$`Race: (choice=Asian)`=="Checked",]$race="Asian"

正如您在上面看到的,我希望新列具有被检查的列名的值。我知道如果检查了多个,可能会出现问题。我可以在运行它之前处理它,但如果还有一个“多重检查”选项会更好。

我可以使用 group_by 和 cur_group_id 做到这一点,但它仍然不能完全解决问题。

data %>% group_by(across(contains("Race:"))) %>%
  mutate(Race = cur_group_id() %>%
           recode("1" = "Black", "2" = "White", "3" = "Other",
                  "4" = "Asian", "5" = NA_character_))

问题在于它需要我手动确定与列名匹配的列 ID。我还担心它可能对数据顺序的变化不可靠。

标签: rtidyverse

解决方案


您可以使用来获取每行中max.col的第一次出现。'Checked'

cols <- grep('Race', names(data), value = TRUE)
values <- cols[max.col(data[cols] == 'Checked', ties.method = 'first')]
values
#[1] "Race: (choice=White)" "Race: (choice=White)" "Race: (choice=White)"
#[4] "Race: (choice=Other)" "Race: (choice=Black)" "Race: (choice=Black)"

也许清除列名:

values <- gsub('.*choice=|\\)$', '', values)
values
#[1] "White" "White" "White" "Other" "Black" "Black"

推荐阅读