首页 > 解决方案 > Create a binary wide table from a long table (like tidyr::spread() )

问题描述

As input for a tree model I created an analysis table in SQL. Now I want to transfer it to R because the model which has this table as input is also running in R. One of the SQL-steps I'm not able to transform into R.

The analysis table has the following form:

df <- data.frame(
  pseudonym = c("a", "a", "a", "b", "c", "c"),
  var1 = c(1,1,0,1,1,0),
  var2 = c(1,0,0,0,0,1),
  var3 = c(0,0,0,0,0,1))

> df
  pseudonym var1 var2 var3
1         a    1    1    0
2         a    1    0    0
3         a    0    0    0
4         b    1    0    0
5         c    1    0    0
6         c    0    1    1

In the next step I need the disctinct rows for pseudonym with keeping the information (1) from the other columns var1, var2, var3. (In SQL this is created through max(case when...then 1 else 0 end) as var1 )

Thus the result df2 created from df1 should be

df2 <- data.frame(
  pseudonym = c("a", "b", "c"),
  var1 = c(1,1,1),
  var2 = c(1,0,1),
  var3 = c(0,0,1))

> df2
  pseudonym var1 var2 var3
1         a    1    1    0
2         b    1    0    0
3         c    1    1    1

It would be very helpful if somebody has an idea.

标签: rdplyrtidyrreshape2

解决方案


Here's one way:

library(dplyr)
library(tidyr)

df <- data.frame(
  pseudonym = c("a", "a", "a", "b", "c", "c"),
  var1 = c(1,1,0,1,1,0),
  var2 = c(1,0,0,0,0,1),
  var3 = c(0,0,0,0,0,1))

df %>% 
  pivot_longer(cols = var1:var3) %>% 
  group_by(pseudonym, name) %>% 
  filter(max(value) == value) %>% 
  ungroup() %>% 
  distinct() %>% 
  pivot_wider(names_from = name, values_from = value)

#># A tibble: 3 x 4
#>  pseudonym  var1  var2  var3
#>  <fct>     <dbl> <dbl> <dbl>
#>1 a             1     1     0
#>2 b             1     0     0
#>3 c             1     1     1

推荐阅读