首页 > 解决方案 > How to sort bars according to 1 of 2 groups in a facet wrap?

问题描述

I hope someone can help me with the following problem: I would like to display the value (avg) of different laboratory parameters (parameter) of 2 different groups (gruppe). Additionally, I want to plot this information according to the change over time (performance) in 3 different facets. Here a tibble of the dataset:

# A tibble: 402 x 4
# Groups:   gruppe, parameter [134]
   gruppe parameter                      performance     avg
   <chr>  <chr>                          <chr>         <dbl>
 1 DGE    ACPA(citrull. Prot.-Ak) EIA/Se change_t1t0 NaN    
 2 DGE    ACPA(citrull. Prot.-Ak) EIA/Se change_t2t0  37.6  
 3 DGE    ACPA(citrull. Prot.-Ak) EIA/Se change_t3t0 NaN    
 4 Fasten Apolipoprot. A1 HP             change_t1t0  41.2 
 5 DGE    Apolipoprot. A1 HP             change_t2t0 NaN    
 6 DGE    Apolipoprot. A1 HP             change_t3t0 NaN    
 7 DGE    Apolipoprotein B               change_t1t0 NaN    
 8 DGE    Apolipoprotein B               change_t2t0 NaN    
 9 Fasten Apolipoprotein B               change_t3t0 NaN    
10 DGE    aPTT Pathromtin SL             change_t1t0   0.571
# … with 392 more rows

This worked totally fine using this code:

#Create labels for 3 facets
lab_labels <- c("Change from Baseline to Day 7 [%]",
                "Change from Baseline to Week 6 [%]",
                "Change from Baseline to Week 12 [%]")

names(lab_labels) <- c("change_t1t0",
                       "change_t2t0",
                       "change_t3t0")

labor_summ_long %>%
  filter(parameter %in% c("Hämatokrit (l/l)","Hämoglobin", "Leukozyten","MCV", "MCH", "MCHC", "RDW-CV", "Thromobzyten","MPV")) %>%
  arrange(desc(avg))%>%
  group_by(gruppe, performance)%>%
  ggplot(aes(x=reorder(parameter,avg), y=avg, group=gruppe, fill = gruppe))+
  geom_col(position = position_dodge())+
  facet_wrap(~performance, 
             scales ="free_y", 
             dir="v",
             labeller = labeller(performance = lab_labels))+
  ylab("") + 
  xlab("") + 
  labs(color="", linetype="")+
  theme_pubclean()+
  theme(strip.background=element_rect(fill="lightgrey"),
        strip.text = element_text(face="bold"),
        legend.position = "bottom",
        legend.title=element_blank())+
  theme(axis.text.x = element_text(angle=45, hjust=1, vjust = 1))+
  scale_x_discrete(labels = c("Hämoglobin"="Hemoglobin", "Leukozyten" = "Leucocytes",
                              "MCV", "MCH", "MCHC", "RDW-CV", "Thromobzyten"="Thrombocytes",
                              "MPV", "Hämatokrit (l/l)"="Hematocrite"))+
  scale_fill_discrete(labels=c('DGE', "Fasten"='Fasting'))

This is how the plot looks like

What I am missing and am failing to find the solution to: I would like to order the bars...

I tricked around with arrange, sort, etc. but couldn't get all the conditions above together.

Do you have any ideas? Thanks a lot in advance!

标签: rsortingggplot2bar-chartfacet-wrap

解决方案


The issue is that reorder reorders by taking the mean of all values for each parameter without taking account of any grouping.

Adapting this answer to your case and making use of some random example data to mimic your real data this could be achieved like so:

The helper function reorder_where allows to order the categories by an additional condition, e.g. in your case where gruppe == "Fasten" & performance == "change_t1t0" is TRUE

library(dplyr)
library(ggplot2)

reorder_where <- function (x, by, where, fun = mean, ...) {
  xx <- x[where]
  byby <- by[where]
  byby <- tapply(byby, xx, FUN = fun, ...)[x]
  reorder(x, byby)
}

labor_summ_long %>%
  filter(parameter %in% c("Hämatokrit (l/l)","Hämoglobin", "Leukozyten","MCV", "MCH", "MCHC", "RDW-CV", "Thromobzyten","MPV")) %>%
  ggplot(aes(x=reorder_where(parameter, -avg, gruppe == "Fasten" & performance == "change_t1t0"), y=avg, group=gruppe, fill = gruppe))+
  geom_col(position = position_dodge())+
  facet_wrap(~performance, 
             scales ="free_y", 
             dir="v",
             labeller = labeller(performance = lab_labels))+
  ylab("") + 
  xlab("") + 
  labs(color="", linetype="")+
  #theme_pubclean()+
  theme(strip.background=element_rect(fill="lightgrey"),
        strip.text = element_text(face="bold"),
        legend.position = "bottom",
        legend.title=element_blank())+
  theme(axis.text.x = element_text(angle=45, hjust=1, vjust = 1))+
  scale_x_discrete(labels = c("Hämoglobin"="Hemoglobin", "Leukozyten" = "Leucocytes",
                              "MCV", "MCH", "MCHC", "RDW-CV", "Thromobzyten"="Thrombocytes",
                              "MPV", "Hämatokrit (l/l)"="Hematocrite"))+
  scale_fill_discrete(labels=c('DGE', "Fasten"='Fasting'))

DATA

set.seed(42)

labor_summ_long <- data.frame(
  parameter = sample(c("Hämatokrit (l/l)","Hämoglobin", "Leukozyten","MCV", "MCH", "MCHC", "RDW-CV", "Thromobzyten","MPV"), 100, replace = TRUE),
  gruppe = sample(c("DGE", "Fasten"), 100, replace = TRUE),
  performance = sample(c("change_t1t0",
                         "change_t2t0",
                         "change_t3t0"), 100, replace = TRUE),
  avg = runif(100, 0, 50)
)
labor_summ_long <- dplyr::distinct(labor_summ_long, parameter, gruppe, performance, .keep_all = TRUE)

推荐阅读