r - 为现有列中的值向数据框中添加列
问题描述
数据框:
Case <- c("Siddhartha", "Siddhartha", "Siddhartha", "Paul", "Paul", "Paul", "Hannah", "Herbert", "Herbert")
Procedure <- c("1", "1", "2", "3", "3", "4", "1", "1", "1")
Location <- c("a", "a", "a", "b", "b", "b", "c", "a", "a")
(df <- data.frame(Case, Procedure, Location))
Case Procedure Location
1 Siddhartha 1 a
2 Siddhartha 1 a
3 Siddhartha 2 a
4 Paul 3 b
5 Paul 3 b
6 Paul 4 b
7 Hannah 1 c
8 Herbert 1 a
9 Herbert 1 a
功能:
df %>%
group_by(Procedure, Location) %>%
summarise(Anzahl = n_distinct(Case)) %>%
arrange(desc(Anzahl))
结果:
Procedure Location Anzahl
<fct> <fct> <int>
1 1 a 2
2 1 c 1
3 2 a 1
4 3 b 1
5 4 b 1
我需要的:
# A tibble: 4 x 4
Procedure a b c
<fct> <int> <int> <int>
1 1 2 0 1
2 2 1 0 0
3 3 0 1 0
4 4 0 1 0
所以我想按程序和位置对数据框进行排序。这是我尝试过的:
df %>%
group_by(Procedure, Location) %>%
summarise(Anzahl = n_distinct(Case)) %>%
pivot_wider(names_from = Location, values_from = n, values_fill = list(n = 0))
但是:错误:这个 tidyselect 接口还不支持谓词。i 联系包作者并建议使用eval_select()
.
我试图在我之前提出的其他问题中解决这个问题(此时几乎感觉像是垃圾邮件),但我无法将解决方案应用于原始数据框。上面显示的函数(group_by,summarize)也适用于原始函数。唯一的问题是,它不会对位置进行排序。
问候
解决方案
这应该有效:
df %>%
group_by(Procedure, Location) %>%
summarise(Anzahl = n_distinct(Case)) %>%
arrange(Location, desc(Anzahl)) %>%
pivot_wider(names_from = Location, values_from = Anzahl, values_fill = list(Anzahl = 0))
这给了我们:
Procedure a b c
<chr> <int> <int> <int>
1 1 2 0 1
2 2 1 0 0
3 3 0 1 0
4 4 0 1 0