r - geom_tile 不呈现真实颜色
问题描述
我正在尝试使用带有以下数据的 geom_tile 在 R 中使用 ggplot2 绘制图:
structure(list(Taxonomy = c("f__Muribaculaceae g__Muribaculaceae",
"f__Muribaculaceae g__Muribaculum", "f__Muribaculaceae g__Muribaculum-1",
"f__Muribaculaceae g__Muribaculaceae-2", "f__Muribaculaceae g__Muribaculaceae-3",
"f__Muribaculaceae g__Muribaculaceae-4", "f__Muribaculaceae g__Muribaculaceae-5",
"f__Muribaculaceae g__Muribaculaceae-6", "f__Muribaculaceae g__Muribaculaceae-7",
"f__Muribaculaceae g__Muribaculaceae-8", "f__Muribaculaceae g__Muribaculaceae-9",
"f__Muribaculaceae g__Muribaculaceae-10", "f__Muribaculaceae g__Muribaculaceae-11",
"f__Muribaculaceae g__Muribaculaceae-12", "f__Muribaculaceae g__Muribaculaceae-13",
"f__Muribaculaceae g__Muribaculaceae-14", "f__Eggerthellaceae g__Enterorhabdus",
"f__Eggerthellaceae g__Enterorhabdus-1", "f__Desulfovibrionaceae g__Desulfovibrio",
"f__Desulfovibrionaceae g__Desulfovibrio-1", "f__Pseudomonadaceae g__Pseudomonas",
"f__Pseudomonadaceae g__Pseudomonas-1", "f__Peptostreptococcaceae g__Romboutsia",
"f__Peptostreptococcaceae g__Romboutsia-1", "f__Clostridiaceae g__Clostridium_sensu_stricto_1",
"f__Clostridiaceae g__Clostridium_sensu_stricto_1-1", "f__Erysipelotrichaceae g__Dubosiella",
"f__Erysipelotrichaceae g__Dubosiella-1", "f__Erysipelotrichaceae g__Dubosiella-2",
"f__Erysipelotrichaceae g__Dubosiella-3", "f__Lactobacillaceae g__Lactobacillus",
"f__Clostridiaceae g__Candidatus_Arthromitus", "f__Oscillospiraceae g__",
"f__Ruminococcaceae g__Paludicola", "f__Lachnospiraceae g__",
"f__Lachnospiraceae g__uncultured", "f__Lachnospiraceae g__Lachnospiraceae_FCS020_group",
"f__Lachnospiraceae g__uncultured-1", "f__Lachnospiraceae g__uncultured-2",
"f__Lachnospiraceae g__Blautia", "f__Lachnospiraceae g__Blautia-1",
"f__Lachnospiraceae g__uncultured-2", "f__Lachnospiraceae g__Lachnospiraceae_NK4A136_group",
"f__Lachnospiraceae g__Lachnospiraceae_NK4A136_group-1", "f__Lachnospiraceae g__Lachnospiraceae_NK4A136_group-2",
"f__Lachnospiraceae g__-3", "f__Lachnospiraceae g__uncultured-4",
"f__Lachnospiraceae g__-5", "f__Lachnospiraceae g__-6", "f__Lachnospiraceae g__GCA-900066575",
"f__Lachnospiraceae g__[Eubacterium]_xylanophilum_group", "f__Lachnospiraceae g__[Eubacterium]_xylanophilum_group-1",
"f__Lachnospiraceae g__uncultured-2", "f__Lachnospiraceae g__-3",
"f__Lachnospiraceae g__Marvinbryantia", "f__Lachnospiraceae g__Marvinbryantia-1",
"f__Lachnospiraceae g__Lachnospiraceae_UCG-006", "f__Lachnospiraceae g__-1",
"f__Lachnospiraceae g__-3", "f__Lachnospiraceae g__Roseburia",
"f__Lachnospiraceae g__Roseburia-1", "f__Lachnospiraceae g__A2",
"f__Lachnospiraceae g__Roseburia-1", "f__Lachnospiraceae g__-2",
"f__Lachnospiraceae g__-3", "f__Lachnospiraceae g__Lachnoclostridium",
"f__Lachnospiraceae g__Lachnoclostridium-1", "f__Lachnospiraceae g__Lachnoclostridium-2",
"f__Lachnospiraceae g__Lachnoclostridium-3", "f__Lachnospiraceae g__Lachnoclostridium-4"
), ICI = c(1.270207852194, 0.939759036144578, 1.08761904761905,
0.9, 0.727611940298507, 0.883895131086142, 0.70253164556962,
1.45454545454545, 1.0327868852459, 0.760598503740648, 1.3495145631068,
1.27551020408163, 1.73170731707317, 1.77027027027027, 0.973867595818815,
0.81981981981982, 0.945454545454546, 18, 0.652707275803723, 0.575313807531381,
0.947368421052632, 2.52380952380952, 1.11811023622047, 1.30864197530864,
2.4078431372549, 2.72727272727273, 1.02564102564103, 0.658536585365854,
0.926829268292683, 0.833333333333333, 0.18705035971223, 12.2,
10, 1, 1.56910569105691, 5.25, 0.918032786885246, 0.857142857142857,
1, 1.94444444444444, 1.85365853658537, 0.471698113207547, 14,
1.22222222222222, 4.95, 1.38983050847458, 13, 0.977777777777778,
1.72727272727273, 2.5, 1.25, 1.47540983606557, 0.7, 1.06666666666667,
0.909090909090909, 1.12903225806452, 0.846153846153846, 0.806201550387597,
95, 0.694117647058824, 0.970588235294118, 0.836842105263158,
1.06507592190889, 0.39344262295082, 0.976744186046512, 1.25196850393701,
1.28333333333333, 0.790960451977401, 0.7, 1.21995332555426),
Sample_ID = c(" ", " ", " ", " ", " ", " ", " ", " ", " ",
" ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ",
" ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ",
" ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ",
" ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ",
" ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ", " ",
" ")), row.names = c("1", "2", "3", "4", "5", "6", "7", "8",
"9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19",
"20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30",
"31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41",
"42", "43", "44", "45", "46", "47", "48", "49", "50", "51", "52",
"53", "54", "55", "56", "57", "58", "59", "60", "61", "62", "63",
"64", "65", "66", "67", "68", "69", "70"), class = "data.frame")
我的目标是使用以下代码创建一列“热图”:
brks = c(0, 10, 25, 50, 75, 100)
g <- ggplot(data = ici_table, aes(x=Sample_ID, y=Taxonomy, fill=ICI)) +
geom_tile(width=0.08, height=0.95) +
scale_fill_gradientn(colors=c("black", "steelblue4",
"steelblue3", "steelblue2", "steelblue",
"yellow2", "yellow1", "yellow"),
limits=c(0,100),
breaks=brks, labels=brks) +
labs(x="ICI", y="") +
theme(panel.background = element_blank(),
legend.position="right",
panel.border = element_blank(),
axis.ticks.y = element_blank(),
axis.text.y = element_blank(),
axis.line.y = element_blank(),
plot.margin= unit(c(1,1,1,-0.1), "cm"),
panel.grid = element_blank())
g
但令人惊讶的是,颜色与真实值不匹配(即,我在数据中找不到 ICI = 95 的黄色方块)。
我将不胜感激任何帮助。谢谢
解决方案
TLDR
有一些问题会导致fill
颜色集中在色谱的“黑色”端,但经过进一步检查,真正的罪魁祸首是Taxonomy
可视化中的每个值都有多个重复值,因此您的95 的值(第 59 行)很可能被后面的值(第 65 行)覆盖,因为它是相同的Taxonomy
。这解释了为什么根据您的问题标题,您觉得“geom_tile 不呈现真实颜色”。
人们会认为最大的违规者是ICI
列内的大量异常值,因此将所有其他值压缩到一个非常小的范围内。这可以验证:
summary(ici_table$ICI)
输出:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1870 0.8489 1.0489 3.4674 1.5457 95.0000
但令人惊讶的是,颜色与真实值不匹配(即,我在数据中找不到 ICI = 95 的黄色方块)。
正如我在 TLDR 中指出的那样,您也不会在可视化上看到与 ICI = 95(第 59 行)的确切值相对应的黄色方块,因为它属于f__Lachnospiraceae g__-3
分类法,而不是仅观察到此分类的值。因此ggplot
,将合理地使用它为 找到的后一个值f__Lachnospiraceae g__-3
,恰好是第 65 行。
作为验证,绘制数据框的最后 15 行,您在可视化中获得的值少于 15 个:
ggplot(data = tail(ici_table, 15), aes(x=Sample_ID, y=Taxonomy, fill=ICI)) + ...
重申一下,您不会找到对应于 15 行 ( tail(ici_table, 15)
) 的 15 个正方形/网格。所以这里存在主要问题:您需要聚合这些值,例如在热图上可视化它们的值之前sum
。mean
换句话说,由于您可能希望使用聚合值(每个 中所有观察值的总和Taxonomy
)绘制热图,因此您应该在绘制热图之前执行聚合。
解决方案
AVGICI
以下代码可视化了grouped by的平均值Taxonomy
:
library(dplyr)
ici_table %>%
tail(15) %>%
group_by(Taxonomy, Sample_ID) %>%
summarise(AvgICI = mean(ICI)) %>%
ungroup() %>%
ggplot(aes(x=Sample_ID, y=Taxonomy, fill=AvgICI)) +
geom_tile(width=0.08, height=0.95) +
scale_fill_gradientn(colors=c("black", "steelblue4",
"steelblue3", "steelblue2", "steelblue",
"yellow2", "yellow1", "yellow"),
limits=c(0,100)
)
这一次您绝对会看到较大的异常值组 ( f__Lachnospiraceae g__-3
),因为它们的平均值为 47.988,而其他组的值在 0 到 1.3 范围内。
您也可以设置sum
而不是mean
聚合函数并观察差异。
brks <- c(0, 10, 25, 50, 75)
library(dplyr)
ici_table %>%
group_by(Taxonomy, Sample_ID) %>%
summarise(AvgICI = mean(ICI)) %>%
ungroup() %>%
ggplot(aes(x=Sample_ID, y=Taxonomy, fill=AvgICI)) +
geom_tile(width=0.08, height=0.95) +
scale_fill_gradientn(colors=c("black", "steelblue4",
"steelblue3", "steelblue2", "steelblue",
"yellow2", "yellow1", "yellow"),
limits=c(0,100),
breaks=brks,
labels=brks
) +
labs(x="ICI (Average)", y="") +
theme(panel.background = element_blank(),
legend.position="right",
panel.border = element_blank(),
axis.ticks.y = element_blank(),
axis.text.y = element_blank(),
axis.line.y = element_blank(),
plot.margin= unit(c(1,1,1,-0.1), "cm"),
panel.grid = element_blank())
随着您的主题,您仍然可以清楚地找到异常组。请注意,由于值被压缩到接近最小限制,大多数条形将近似为黑色: