首页 > 解决方案 > 为现有列中的值向数据框中添加列

问题描述

数据框:

Case <- c("Siddhartha", "Siddhartha", "Siddhartha", "Paul", "Paul", "Paul", "Hannah", "Herbert", "Herbert")
Procedure <- c("1", "1", "2", "3", "3", "4", "1", "1", "1")
Location <- c("a", "a", "a", "b", "b", "b", "c", "a", "a")

(df <- data.frame(Case, Procedure, Location))

        Case Procedure Location
1 Siddhartha         1        a
2 Siddhartha         1        a
3 Siddhartha         2        a
4       Paul         3        b
5       Paul         3        b
6       Paul         4        b
7     Hannah         1        c
8    Herbert         1        a
9    Herbert         1        a

功能:

df %>%
  group_by(Procedure, Location) %>%
  summarise(Anzahl = n_distinct(Case)) %>%
  arrange(desc(Anzahl))

结果:

  Procedure Location Anzahl
  <fct>     <fct>     <int>
1 1         a             2
2 1         c             1
3 2         a             1
4 3         b             1
5 4         b             1

我需要的:

# A tibble: 4 x 4
  Procedure     a     b     c
  <fct>     <int> <int> <int>
1 1             2     0     1
2 2             1     0     0
3 3             0     1     0
4 4             0     1     0

所以我想按程序和位置对数据框进行排序。这是我尝试过的:

df %>%
  group_by(Procedure, Location) %>%
  summarise(Anzahl = n_distinct(Case)) %>%
  pivot_wider(names_from = Location, values_from = n, values_fill = list(n = 0))

但是:错误:这个 tidyselect 接口还不支持谓词。i 联系包作者并建议使用eval_select().

我试图在我之前提出的其他问题中解决这个问题(此时几乎感觉像是垃圾邮件),但我无法将解决方案应用于原始数据框。上面显示的函数(group_by,summarize)也适用于原始函数。唯一的问题是,它不会对位置进行排序。

问候

标签: r

解决方案


这应该有效:

df %>% 
  group_by(Procedure, Location) %>% 
  summarise(Anzahl = n_distinct(Case)) %>%
  arrange(Location, desc(Anzahl)) %>% 
  pivot_wider(names_from = Location, values_from = Anzahl, values_fill = list(Anzahl = 0))

这给了我们:

  Procedure     a     b     c
  <chr>     <int> <int> <int>
1 1             2     0     1
2 2             1     0     0
3 3             0     1     0
4 4             0     1     0

推荐阅读