首页 > 解决方案 > dplyr case_when 使用多列时不返回预期结果

问题描述

我正在尝试id根据使用 case when 的匹配来改变一个新列。找到匹配项时添加一个 ID 值,如果没有则保留 NA。当我在case_when测试中的列较少时,这很有效。但是,当我使用多列时,输出与预期不同。下面是一个可重现的例子

set.seed(10)
library(dplyr)
values <- c(0, 1)
country <- c("USA", "Germany","UK","Russia","China")
role <- c("admin", "developer","UI designer","HR","manager")
Df <- dplyr::tibble(
  cname = sample(country, 10, replace = TRUE),
  role = sample(role, 10, replace = TRUE),
  b = sample(values, 10, replace = TRUE),
  c = sample(values, 10, replace = TRUE),
  d = sample(values, 10, replace = TRUE),
  e = sample(values, 10, replace = TRUE),
  f = sample(values, 10, replace = TRUE),
  g = sample(values, 10, replace = TRUE)
)
Df
# A tibble: 10 x 8
   cname   role            b     c     d     e     f     g
   <chr>   <chr>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1 China   manager         0     0     1     0     1     1
 2 UK      developer       0     1     1     1     1     0
 3 USA     developer       0     0     0     0     1     0
 4 Germany HR              1     0     0     0     1     0
 5 UK      developer       0     1     1     1     0     0
 6 Germany admin           1     1     1     1     1     0
 7 Germany UI designer     1     0     0     1     0     0
 8 UK      HR              1     0     1     0     1     1
 9 USA     HR              1     0     1     0     1     1
10 China   manager         0     0     0     0     0     0

每个部门列都需要一个 id。 b = 1, c = 2, d =3, e = 4, f = 5, g = 6 预期输出。我已经删除了其他列,但是如果我们保留具有 0,1 值的列也是可以的

cname   role    Department  Departmet_id
China   manager     d         3
China   manager     f         5
China   manager     g         6
UK      developer   c         2
UK      developer   d         3
UK      developer   e         4
UK      developer   f         5
UK      developer   g         6
USA     developer   f         1

reprex 包于 2021-09-01 创建(v2.0.1)

标签: rdplyrtidyverse

解决方案


更新

仍然基于卡米尔的评论:

df %>% 
  pivot_longer(-c(cname, role), 
               names_to = "Departement",
               values_to = "Departement_ID") %>% 
  group_by(cname, role, Departement) %>% 
  summarise(Departement_ID = ifelse(any(Departement_ID == 1), 
                           which(names(df) == unique(Departement)) - 2, 
                           NA_integer_)) %>% 
  drop_na()

返回

# A tibble: 28 x 4
# Groups:   cname, role [8]
   cname   role      Departement Departement_ID
   <chr>   <chr>     <chr>                <dbl>
 1 China   developer d                        3
 2 China   developer e                        4
 3 China   developer f                        5
 4 Germany HR        b                        1
 5 Germany HR        c                        2
 6 Germany manager   b                        1
 7 Germany manager   c                        2
 8 Germany manager   d                        3
 9 Germany manager   e                        4
10 Germany manager   f                        5
11 Russia  manager   b                        1
12 Russia  manager   d                        3
13 UK      admin     b                        1
14 UK      admin     c                        2
15 UK      admin     e                        4
16 UK      admin     f                        5
17 UK      admin     g                        6
18 UK      developer b                        1
19 UK      developer c                        2
20 UK      developer d                        3
21 UK      developer f                        5
22 UK      manager   b                        1
23 UK      manager   c                        2
24 UK      manager   d                        3
25 UK      manager   e                        4
26 UK      manager   f                        5
27 USA     manager   e                        4
28 USA     manager   g                        6

数据

df <- structure(list(cname = c("UK", "USA", "Germany", "Russia", "UK", 
"Germany", "Germany", "Germany", "China", "UK"), role = c("developer", 
"manager", "manager", "manager", "admin", "HR", "developer", 
"manager", "developer", "manager"), b = c(1, 0, 0, 1, 1, 1, 0, 
1, 0, 1), c = c(1, 0, 1, 0, 1, 1, 0, 0, 0, 1), d = c(1, 0, 1, 
1, 0, 0, 0, 0, 1, 1), e = c(0, 1, 0, 0, 1, 0, 0, 1, 1, 1), f = c(1, 
0, 1, 0, 1, 0, 0, 1, 1, 1), g = c(0, 1, 0, 0, 1, 0, 0, 0, 0, 
0)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))

推荐阅读