首页 > 解决方案 > Combination of reshaping and aggregating a data.frame

问题描述

I'm still somewhat new so please forgive any missteps, but I currently have a data.frame that looks like this.

Outcome1  Outcome2  Num_Occurances Name
False       False       2          John Doe
False       True        2          John Doe
True        False       4          John Doe
True        True        2          John Doe
False       True        1          Sally Doe
True        False       1          Sally Doe

I would like to both reshape and aggregate the data into a wider format, to eventually look like this:

Name        successful_Outcome2  Total_Occurances  successful_Outcome1    Total_Occurances_Outcome1 
John Doe           4                  10                   2                        6
Sally Doe          1                   2                   0                        1

I understand that the dcast function can be used to melt and recast data into a wide format, but the combination of the different outcomes is throwing me for a loop. Any help would be appreciated!

标签: rdataframeaggregatedcast

解决方案


A base R solution with aggregate + transform, i.e.,

dfout <- aggregate(.~Name,
                   transform(df,
                             successful_outcome2 = Outcome2*Num_Occurances,
                             Total_Occurances = Num_Occurances,
                             successful_Outcome1 = Outcome1*Outcome2*Num_Occurances,
                             Total_Occurances_Outcome1 = Outcome1*Num_Occurances),
                   sum)[-(2:4)]

yielding

> dfout
       Name successful_outcome2 Total_Occurances successful_Outcome1 Total_Occurances_Outcome1
1  John Doe                   4               10                   2                         6
2 Sally Doe                   1                2                   0                         1

DATA

df <- structure(list(Outcome1 = c(FALSE, FALSE, TRUE, TRUE, FALSE, 
TRUE), Outcome2 = c(FALSE, TRUE, FALSE, TRUE, TRUE, FALSE), Num_Occurances = c(2L, 
2L, 4L, 2L, 1L, 1L), Name = c("John Doe", "John Doe", "John Doe", 
"John Doe", "Sally Doe", "Sally Doe")), class = "data.frame", row.names = c(NA, 
-6L))

推荐阅读