首页 > 解决方案 > How to make a stacked-bar plot of a confusion matrix

问题描述

I have two categorical variables that I want to compare through cross-tabulation. I made a dummy example of an extended square contingency table where the categories of X are in the table’s rows, while the same sequence of categories for Y are in the table’s columns. The table summarizes the association between X and Y.

The table’s diagonal entries give the number of observations for which the X category matches the Y category, in which case the observations are Hits for the category. Each off-diagonal entry is a False Alarm for the X category and a Miss for the Y category.

extended square contingency table

Graph1

I want to make a stacked barplot like graph1 that shows each row of the table with the filled colors coming from the columns (Y variable). And the last two bars showing the misses and false alarms for each category.

I managed to make two separate graphs. The code below generates the first four rows of graph1.

# Create Dummy Input
sample.mtxx <- matrix(c(1,0,2,0,0,3,0,3,2,3,3,0,0,0,0,3), nrow = 4)
categories <- c("A","B","C","D")
colnames(sample.mtx) <- paste(categories)
rownames(sample.mtx) <- paste(categories)

# Change from wide to long format
g1.df <- melt(sample.mtx)
# Zero sizes were causing problem so I removed them.
g1.df <- g1.df[g1.df$value!=0,]

# Add a label column to show "Hit".
g1.df$label <- ifelse(g1.df$Var1==g1.df$Var2, "Hit", as.character(""))

# Plotting
plot1 <- ggplot(data=g1.df, mapping=aes(fill=Var2, y=value, x=Var1, label = label))+
  geom_bar(width = 0.6, position="stack", stat="identity")+
  labs(x="Table Feature", y="Entry size as the number of observations", title="Entry Size") +
  geom_text(size = 4, position = position_stack(vjust = 0.5))+
  coord_flip()+
  theme_bw()+
  scale_x_discrete(limit = c("D", "C", "B", "A"))+
  scale_y_discrete(limits=seq(0,10,1))+
  theme(plot.title = element_text(family = "Times", color = "#353535", 
                                  face = "bold", size = 12, hjust = 0.5))+
  theme(legend.position = "bottom", legend.title = element_blank())+
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor.y = element_blank()
  ) 

Plot1 shows the result. The problem is the order of components in the stacked bars are not right. I 'll be grateful if someone can explain How can I arrange the components of the bar?

To plot Misses and False Alarms this is what I coded:

# Hits, False Alarm, and Miss
hits <- diag(sample.mtx)
false.alarms <- rowSums(sample.mtx) - hits
misses <- colSums(sample.mtx) - hits

# Make a data frame
g1.df1 <- as.data.frame(cbind(categories, misses, false.alarms))

# Change it to long format and get rid of zero sizes.
g1.df1.m <- melt(g1.df1, id.vars="categories")
g1.df1.m <- g1.df1.m[g1.df1.m$value!=0,]

# Plotting
plot2 <- ggplot(data=g1.df1.m, inherit.aes = FALSE, mapping=aes(fill=categories, y=value, x=variable))+
  geom_bar(width = 0.6, position="stack", stat="identity")+
  coord_flip()+
  theme_bw()+
  theme(legend.position = "none")+
  scale_x_discrete(limit = c("misses", "false.alarms"))+
  scale_y_discrete(limits=seq(0,10,1))

plot2 I am happy with this plot. But what I want is to have both plot1 and plot 2 in one plot like I shown in graph1. Can anyone please provide guidance on how to draw stacked-bar plots from different data frames. Or is there a better way to make graph1.

标签: rggplot2bar-chartdata-visualization

解决方案


“如何安排酒吧的组件?”

诀窍是使用类型levels列的属性(例如,此处的类别)。您需要反转命令。

我想要的是将 plot1 和 plot 2 都放在一个图中,就像我在 graph1 中显示的那样。

如果您只想Graph1用您的代码重现 ,这可行:

#-------------------
#Data wrangling
colnames(g1.df)[1] <-categories; colnames(g1.df)[2] <- variable; #change the names similar to 2nd df
g1.df1.m[,3] <- as.numeric(g1.df1.m[,3]);# changing column type from character to numeric as the correpsonding column in `g1.df` is numeric.
g1.dfCombined <- g1.df %>% bind_rows(g1.df1.m); #merging two dfs.
#this is the part that reverses the order:
g1.dfCombined$categories <- factor(g1.dfCombined$categories, rev(levels(g1.dfCombined$categories)))

#-------------------
#Plotting: (all same except dropped `scale_x_discrete(limit = c("D", "C", "B", "A"))`)
ggplot(data=g1.dfCombined, mapping=aes(fill=categories, y=value, x=variable, label = label))+
 geom_bar(width = 0.6, position="stack", stat="identity")+
 labs(x="Table Feature", y="Entry size as the number of observations", title="Entry Size") +
 geom_text(size = 4, position = position_stack(vjust = 0.5)) +
 coord_flip()+theme_bw() + scale_y_discrete(limits=seq(0,10,1)) +
 theme(plot.title = element_text(family = "Times", color = "#353535", face = "bold", 
 size = 12, hjust = 0.5)) +
theme(legend.position = "bottom", legend.title = element_blank()) +
theme(panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank())

在此处输入图像描述

注意:使用不同的值可能是一个好主意(例如,使用1,2,3, ...而不是A,B,C后者使用两个不同的变量categories&两次variable,这可能会造成混淆。)


推荐阅读