首页 > 解决方案 > How can we show 0 for the levels with 0 observation in a factor variable while using dcase

问题描述

I have a df like this:

enter image description here

df<-structure(list(AEOUT = c("RECOVERED/RESOLVED", "RECOVERED/RESOLVED", 
"RECOVERED/RESOLVED", "NOT RECOVERED/NOT RESOLVED", "NOT RECOVERED/NOT RESOLVED", 
"RECOVERED/RESOLVED", "NOT RECOVERED/NOT RESOLVED", "FATAL", 
"RECOVERED/RESOLVED", "NOT RECOVERED/NOT RESOLVED", "NOT RECOVERED/NOT RESOLVED", 
"RECOVERED/RESOLVED", "RECOVERED/RESOLVED", "RECOVERED/RESOLVED", 
"NOT RECOVERED/NOT RESOLVED", "NOT RECOVERED/NOT RESOLVED", "NOT RECOVERED/NOT RESOLVED", 
"NOT RECOVERED/NOT RESOLVED"), AEREL1S = c("UNRELATED", "UNRELATED", 
"UNRELATED", "UNRELATED", "UNRELATED", "UNRELATED", "UNRELATED", 
"UNRELATED", "UNRELATED", "UNRELATED", "UNRELATED", "RELATED", 
"RELATED", "RELATED", "RELATED", "UNRELATED", "UNRELATED", "UNRELATED"
)), row.names = c(NA, -18L), class = c("tbl_df", "tbl", "data.frame"
))
> test<-df %>%dcast(.,AEOUT~AEREL1S)
Using 'AEREL1S' as value column. Use 'value.var' to override
Aggregation function missing: defaulting to length
Warning message:
In dcast(., AEOUT ~ AEREL1S) :
  The dcast generic in data.table has been passed a tbl_df and will attempt to redirect to the reshape2::dcast; please note that reshape2 is deprecated, and this redirection is now deprecated as well. Please do this redirection yourself like reshape2::dcast(.). In the next version, this warning will become an error.
> dput(head(AE_OC, n=18))
structure(list(AEOUT = c("RECOVERED/RESOLVED", "RECOVERED/RESOLVED", 
"RECOVERED/RESOLVED", "NOT RECOVERED/NOT RESOLVED", "NOT RECOVERED/NOT RESOLVED", 
"RECOVERED/RESOLVED", "NOT RECOVERED/NOT RESOLVED", "FATAL", 
"RECOVERED/RESOLVED", "NOT RECOVERED/NOT RESOLVED", "NOT RECOVERED/NOT RESOLVED", 
"RECOVERED/RESOLVED", "RECOVERED/RESOLVED", "RECOVERED/RESOLVED", 
"NOT RECOVERED/NOT RESOLVED", "NOT RECOVERED/NOT RESOLVED", "NOT RECOVERED/NOT RESOLVED", 
"NOT RECOVERED/NOT RESOLVED"), AEREL1S = c("UNRELATED", "UNRELATED", 
"UNRELATED", "UNRELATED", "UNRELATED", "UNRELATED", "UNRELATED", 
"UNRELATED", "UNRELATED", "UNRELATED", "UNRELATED", "RELATED", 
"RELATED", "RELATED", "RELATED", "UNRELATED", "UNRELATED", "UNRELATED"
)), row.names = c(NA, -18L), class = c("tbl_df", "tbl", "data.frame"
))

For AEOUT, it has 6 levels. I wonder how should I show 0 for the levels that are not in the table?

AEOUT = factor(AEOUT, levels = c("RECOVERED/RESOLVED","RECOVERED/RESOLVED WITH SEQUELAE", "RECOVERING/RESOLVING", "NOT RECOVERED/NOT RESOLVED", "FATAL", "UNKNOWN"))

I tried to summaries the data.Is it possible for me to keep the level even there is 0 obs?

My current codes are:

test<-df %>%dcast(.,AEOUT~AEREL1S)

and output looks like this:

enter image description here

And my ideal ouput should looks like this in this order: enter image description here

标签: r

解决方案


You can use the dplyr::count function with .drop = FALSE to do just what you want:

library(dplyr)
df %>% 
   mutate(AEOUT = factor(AEOUT, levels = c("RECOVERED/RESOLVED","RECOVERED/RESOLVED WITH SEQUELAE", "RECOVERING/RESOLVING", "NOT RECOVERED/NOT RESOLVED", "FATAL", "UNKNOWN"))) %>%
   count(AEOUT, .drop = FALSE)
## A tibble: 6 x 2
#  AEOUT                                n
#  <fct>                            <int>
#1 RECOVERED/RESOLVED                   4
#2 RECOVERED/RESOLVED WITH SEQUELAE     0
#3 RECOVERING/RESOLVING                 0
#4 NOT RECOVERED/NOT RESOLVED           3
#5 FATAL                                1
#6 UNKNOWN                              0

推荐阅读