首页 > 解决方案 > geom_tile 不呈现真实颜色

问题描述

我正在尝试使用带有以下数据的 geom_tile 在 R 中使用 ggplot2 绘制图:

structure(list(Taxonomy = c("f__Muribaculaceae g__Muribaculaceae", 
"f__Muribaculaceae g__Muribaculum", "f__Muribaculaceae g__Muribaculum-1", 
"f__Muribaculaceae g__Muribaculaceae-2", "f__Muribaculaceae g__Muribaculaceae-3", 
"f__Muribaculaceae g__Muribaculaceae-4", "f__Muribaculaceae g__Muribaculaceae-5", 
"f__Muribaculaceae g__Muribaculaceae-6", "f__Muribaculaceae g__Muribaculaceae-7", 
"f__Muribaculaceae g__Muribaculaceae-8", "f__Muribaculaceae g__Muribaculaceae-9", 
"f__Muribaculaceae g__Muribaculaceae-10", "f__Muribaculaceae g__Muribaculaceae-11", 
"f__Muribaculaceae g__Muribaculaceae-12", "f__Muribaculaceae g__Muribaculaceae-13", 
"f__Muribaculaceae g__Muribaculaceae-14", "f__Eggerthellaceae g__Enterorhabdus", 
"f__Eggerthellaceae g__Enterorhabdus-1", "f__Desulfovibrionaceae g__Desulfovibrio", 
"f__Desulfovibrionaceae g__Desulfovibrio-1", "f__Pseudomonadaceae g__Pseudomonas", 
"f__Pseudomonadaceae g__Pseudomonas-1", "f__Peptostreptococcaceae g__Romboutsia", 
"f__Peptostreptococcaceae g__Romboutsia-1", "f__Clostridiaceae g__Clostridium_sensu_stricto_1", 
"f__Clostridiaceae g__Clostridium_sensu_stricto_1-1", "f__Erysipelotrichaceae g__Dubosiella", 
"f__Erysipelotrichaceae g__Dubosiella-1", "f__Erysipelotrichaceae g__Dubosiella-2", 
"f__Erysipelotrichaceae g__Dubosiella-3", "f__Lactobacillaceae g__Lactobacillus", 
"f__Clostridiaceae g__Candidatus_Arthromitus", "f__Oscillospiraceae g__", 
"f__Ruminococcaceae g__Paludicola", "f__Lachnospiraceae g__", 
"f__Lachnospiraceae g__uncultured", "f__Lachnospiraceae g__Lachnospiraceae_FCS020_group", 
"f__Lachnospiraceae g__uncultured-1", "f__Lachnospiraceae g__uncultured-2", 
"f__Lachnospiraceae g__Blautia", "f__Lachnospiraceae g__Blautia-1", 
"f__Lachnospiraceae g__uncultured-2", "f__Lachnospiraceae g__Lachnospiraceae_NK4A136_group", 
"f__Lachnospiraceae g__Lachnospiraceae_NK4A136_group-1", "f__Lachnospiraceae g__Lachnospiraceae_NK4A136_group-2", 
"f__Lachnospiraceae g__-3", "f__Lachnospiraceae g__uncultured-4", 
"f__Lachnospiraceae g__-5", "f__Lachnospiraceae g__-6", "f__Lachnospiraceae g__GCA-900066575", 
"f__Lachnospiraceae g__[Eubacterium]_xylanophilum_group", "f__Lachnospiraceae g__[Eubacterium]_xylanophilum_group-1", 
"f__Lachnospiraceae g__uncultured-2", "f__Lachnospiraceae g__-3", 
"f__Lachnospiraceae g__Marvinbryantia", "f__Lachnospiraceae g__Marvinbryantia-1", 
"f__Lachnospiraceae g__Lachnospiraceae_UCG-006", "f__Lachnospiraceae g__-1", 
"f__Lachnospiraceae g__-3", "f__Lachnospiraceae g__Roseburia", 
"f__Lachnospiraceae g__Roseburia-1", "f__Lachnospiraceae g__A2", 
"f__Lachnospiraceae g__Roseburia-1", "f__Lachnospiraceae g__-2", 
"f__Lachnospiraceae g__-3", "f__Lachnospiraceae g__Lachnoclostridium", 
"f__Lachnospiraceae g__Lachnoclostridium-1", "f__Lachnospiraceae g__Lachnoclostridium-2", 
"f__Lachnospiraceae g__Lachnoclostridium-3", "f__Lachnospiraceae g__Lachnoclostridium-4"
), ICI = c(1.270207852194, 0.939759036144578, 1.08761904761905, 
0.9, 0.727611940298507, 0.883895131086142, 0.70253164556962, 
1.45454545454545, 1.0327868852459, 0.760598503740648, 1.3495145631068, 
1.27551020408163, 1.73170731707317, 1.77027027027027, 0.973867595818815, 
0.81981981981982, 0.945454545454546, 18, 0.652707275803723, 0.575313807531381, 
0.947368421052632, 2.52380952380952, 1.11811023622047, 1.30864197530864, 
2.4078431372549, 2.72727272727273, 1.02564102564103, 0.658536585365854, 
0.926829268292683, 0.833333333333333, 0.18705035971223, 12.2, 
10, 1, 1.56910569105691, 5.25, 0.918032786885246, 0.857142857142857, 
1, 1.94444444444444, 1.85365853658537, 0.471698113207547, 14, 
1.22222222222222, 4.95, 1.38983050847458, 13, 0.977777777777778, 
1.72727272727273, 2.5, 1.25, 1.47540983606557, 0.7, 1.06666666666667, 
0.909090909090909, 1.12903225806452, 0.846153846153846, 0.806201550387597, 
95, 0.694117647058824, 0.970588235294118, 0.836842105263158, 
1.06507592190889, 0.39344262295082, 0.976744186046512, 1.25196850393701, 
1.28333333333333, 0.790960451977401, 0.7, 1.21995332555426), 
    Sample_ID = c(" ", " ", " ", " ", " ", " ", " ", " ", " ", 
    " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", 
    " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", 
    " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", 
    " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", 
    " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", 
    " ")), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", 
"9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", 
"20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", 
"31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", 
"42", "43", "44", "45", "46", "47", "48", "49", "50", "51", "52", 
"53", "54", "55", "56", "57", "58", "59", "60", "61", "62", "63", 
"64", "65", "66", "67", "68", "69", "70"), class = "data.frame")

我的目标是使用以下代码创建一列“热图”:

brks = c(0, 10, 25, 50, 75, 100)
g <- ggplot(data = ici_table, aes(x=Sample_ID, y=Taxonomy, fill=ICI)) + 
  geom_tile(width=0.08, height=0.95) + 
  scale_fill_gradientn(colors=c("black", "steelblue4",
                                "steelblue3", "steelblue2", "steelblue",
                                "yellow2", "yellow1", "yellow"),
                       limits=c(0,100),
                       breaks=brks, labels=brks) + 
  labs(x="ICI", y="") +
  theme(panel.background = element_blank(),
        legend.position="right",
        panel.border = element_blank(),
        axis.ticks.y = element_blank(),
        axis.text.y = element_blank(),
        axis.line.y = element_blank(),
        plot.margin= unit(c(1,1,1,-0.1), "cm"),
        panel.grid = element_blank())

g

但令人惊讶的是,颜色与真实值不匹配(即,我在数据中找不到 ICI = 95 的黄色方块)。

在此处输入图像描述

我将不胜感激任何帮助。谢谢

标签: rggplot2

解决方案


TLDR

有一些问题会导致fill颜色集中在色谱的“黑色”端,但经过进一步检查,真正的罪魁祸首是Taxonomy可视化中的每个值都有多个重复值,因此您的95 的值(第 59 行)很可能被后面的值(第 65 行)覆盖,因为它是相同的Taxonomy。这解释了为什么根据您的问题标题,您觉得“geom_tile 不呈现真实颜色”。

人们会认为最大的违规者是ICI列内的大量异常值,因此将所有其他值压缩到一个非常小的范围内。这可以验证:

summary(ici_table$ICI)

输出:

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1870  0.8489  1.0489  3.4674  1.5457 95.0000 

但令人惊讶的是,颜色与真实值不匹配(即,我在数据中找不到 ICI = 95 的黄色方块)。

正如我在 TLDR 中指出的那样,您也不会在可视化上看到与 ICI = 95(第 59 行)的确切值相对应的黄色方块,因为它属于f__Lachnospiraceae g__-3分类法,而不是仅观察到此分类的值。因此ggplot,将合理地使用它为 找到的后一个值f__Lachnospiraceae g__-3,恰好是第 65 行。

在此处输入图像描述

作为验证,绘制数据框的最后 15 行,您在可视化中获得的值少于 15 个:

ggplot(data = tail(ici_table, 15), aes(x=Sample_ID, y=Taxonomy, fill=ICI)) + ...

重申一下,您不会找到对应于 15 行 ( tail(ici_table, 15)) 的 15 个正方形/网格。所以这里存在主要问题:您需要聚合这些值,例如在热图上可视化它们的值之前summean

换句话说,由于您可能希望使用聚合值(每个 中所有观察值的总和Taxonomy)绘制热图,因此您应该在绘制热图之前执行聚合。

解决方案

AVGICI以下代码可视化了grouped by的平均值Taxonomy

library(dplyr)
ici_table %>% 
  tail(15) %>% 
  group_by(Taxonomy, Sample_ID) %>% 
  summarise(AvgICI = mean(ICI)) %>% 
  ungroup() %>% 
  ggplot(aes(x=Sample_ID, y=Taxonomy, fill=AvgICI)) + 
  geom_tile(width=0.08, height=0.95) + 
  scale_fill_gradientn(colors=c("black", "steelblue4",
                                "steelblue3", "steelblue2", "steelblue",
                                "yellow2", "yellow1", "yellow"),
                       limits=c(0,100)
                       )

这一次您绝对会看到较大的异常值组 ( f__Lachnospiraceae g__-3),因为它们的平均值为 47.988,而其他组的值在 0 到 1.3 范围内。

您也可以设置sum而不是mean聚合函数并观察差异。

brks <-  c(0, 10, 25, 50, 75)
library(dplyr)
ici_table %>% 
  group_by(Taxonomy, Sample_ID) %>% 
  summarise(AvgICI = mean(ICI)) %>% 
  ungroup() %>% 
  ggplot(aes(x=Sample_ID, y=Taxonomy, fill=AvgICI)) + 
  geom_tile(width=0.08, height=0.95) + 
  scale_fill_gradientn(colors=c("black", "steelblue4",
                                "steelblue3", "steelblue2", "steelblue",
                                "yellow2", "yellow1", "yellow"),
                       limits=c(0,100),
                       breaks=brks, 
                       labels=brks
                       ) +
  labs(x="ICI (Average)", y="") +
  theme(panel.background = element_blank(),
        legend.position="right",
        panel.border = element_blank(),
        axis.ticks.y = element_blank(),
        axis.text.y = element_blank(),
        axis.line.y = element_blank(),
        plot.margin= unit(c(1,1,1,-0.1), "cm"),
        panel.grid = element_blank())

随着您的主题,您仍然可以清楚地找到异常组。请注意,由于值被压缩到接近最小限制,大多数条形将近似为黑色:

在此处输入图像描述


推荐阅读