r - dplyr case_when 使用多列时不返回预期结果
问题描述
我正在尝试id
根据使用 case when 的匹配来改变一个新列。找到匹配项时添加一个 ID 值,如果没有则保留 NA。当我在case_when
测试中的列较少时,这很有效。但是,当我使用多列时,输出与预期不同。下面是一个可重现的例子
set.seed(10)
library(dplyr)
values <- c(0, 1)
country <- c("USA", "Germany","UK","Russia","China")
role <- c("admin", "developer","UI designer","HR","manager")
Df <- dplyr::tibble(
cname = sample(country, 10, replace = TRUE),
role = sample(role, 10, replace = TRUE),
b = sample(values, 10, replace = TRUE),
c = sample(values, 10, replace = TRUE),
d = sample(values, 10, replace = TRUE),
e = sample(values, 10, replace = TRUE),
f = sample(values, 10, replace = TRUE),
g = sample(values, 10, replace = TRUE)
)
Df
# A tibble: 10 x 8
cname role b c d e f g
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 China manager 0 0 1 0 1 1
2 UK developer 0 1 1 1 1 0
3 USA developer 0 0 0 0 1 0
4 Germany HR 1 0 0 0 1 0
5 UK developer 0 1 1 1 0 0
6 Germany admin 1 1 1 1 1 0
7 Germany UI designer 1 0 0 1 0 0
8 UK HR 1 0 1 0 1 1
9 USA HR 1 0 1 0 1 1
10 China manager 0 0 0 0 0 0
每个部门列都需要一个 id。
b = 1, c = 2, d =3, e = 4, f = 5, g = 6
预期输出。我已经删除了其他列,但是如果我们保留具有 0,1 值的列也是可以的
cname role Department Departmet_id
China manager d 3
China manager f 5
China manager g 6
UK developer c 2
UK developer d 3
UK developer e 4
UK developer f 5
UK developer g 6
USA developer f 1
由reprex 包于 2021-09-01 创建(v2.0.1)
解决方案
更新
仍然基于卡米尔的评论:
df %>%
pivot_longer(-c(cname, role),
names_to = "Departement",
values_to = "Departement_ID") %>%
group_by(cname, role, Departement) %>%
summarise(Departement_ID = ifelse(any(Departement_ID == 1),
which(names(df) == unique(Departement)) - 2,
NA_integer_)) %>%
drop_na()
返回
# A tibble: 28 x 4
# Groups: cname, role [8]
cname role Departement Departement_ID
<chr> <chr> <chr> <dbl>
1 China developer d 3
2 China developer e 4
3 China developer f 5
4 Germany HR b 1
5 Germany HR c 2
6 Germany manager b 1
7 Germany manager c 2
8 Germany manager d 3
9 Germany manager e 4
10 Germany manager f 5
11 Russia manager b 1
12 Russia manager d 3
13 UK admin b 1
14 UK admin c 2
15 UK admin e 4
16 UK admin f 5
17 UK admin g 6
18 UK developer b 1
19 UK developer c 2
20 UK developer d 3
21 UK developer f 5
22 UK manager b 1
23 UK manager c 2
24 UK manager d 3
25 UK manager e 4
26 UK manager f 5
27 USA manager e 4
28 USA manager g 6
数据
df <- structure(list(cname = c("UK", "USA", "Germany", "Russia", "UK",
"Germany", "Germany", "Germany", "China", "UK"), role = c("developer",
"manager", "manager", "manager", "admin", "HR", "developer",
"manager", "developer", "manager"), b = c(1, 0, 0, 1, 1, 1, 0,
1, 0, 1), c = c(1, 0, 1, 0, 1, 1, 0, 0, 0, 1), d = c(1, 0, 1,
1, 0, 0, 0, 0, 1, 1), e = c(0, 1, 0, 0, 1, 0, 0, 1, 1, 1), f = c(1,
0, 1, 0, 1, 0, 0, 1, 1, 1), g = c(0, 1, 0, 0, 1, 0, 0, 0, 0,
0)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))