首页 > 解决方案 > 用预先选择的颜色分别为 ggplot 中的每个数据点在方框和晶须图中着色

问题描述

我正在使用 ggplot 绘制箱线图和晶须图,其中每个数据点也绘制在顶部。

这是我目前的代码:

df_3UTR_ext_colors <- read_excel(pathname_ext, sheet = "Sheet1", 
                                 col_types = c("text", "text", "text", 
                                               "numeric", "text", "text"))

# Plotting
allGenes_colors_simple <- ggplot(df_3UTR_ext_colors_simple, aes(Gene, Length) )

allGenes_colors_simple + 
  geom_boxplot(outlier.shape = NA, aes(fill = Genus), alpha = 0.5) +
  scale_fill_brewer(palette="Set1") +
  geom_point(aes(fill = Genus), 
             size = 2, shape = 21, position = position_dodge(width = 0.75)) 

这是该代码的当前输出:

上述代码的输出

我想根据我添加到数据框中的十六进制代码单独为每个点着色

理想情况下,我希望每个点的颜色与其相关框的颜色不同(例如:构成亨尼帕病毒属的所有病毒都是不同深浅的红色)。我已经手动完成并使用颜色列中的十六进制代码完成了此操作,以防万一这是最简单的方法。

我已经尝试了很多次迭代都无济于事。例如,在geom_point()我使用aes(fill = Genus). 如果我改为将其替换为aes(fill = Virus)结果,则看起来像这样:

在 geom_point() 中切换到 aes(fill = Virus)。

显然这里有很多问题。一是调色板用完了。这很可能很容易解决,所以我并不过分担心。另一个是突然数据点不再与它们关联的框对齐,它们开始偏离。此外,这也限制了我对每个单独点的颜色的手动控制。

我的感觉是,使用 RColorBrewer 有许多更简单的方法可以为每个属分配调色板,为病毒赋予自己的颜色(事实上,手动浏览调色板的过程给了我手动添加到数据框的十六进制代码)。但是我并不太担心如果我可以ggplot根据我手动添加的颜色单独为每个点着色。

有人有建议吗?

> dput(df_3UTR_ext_colors_simple)
structure(list(Genus = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("Henipavirus", 
"Morbillivirus", "Rubulavirus", "Respirovirus", "Avulavirus", 
"Aquaparamyxovirus", "Ferlavirus"), class = "factor"), Virus = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 
3L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 
6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 
9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 
11L, 11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 
13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L, 15L, 
15L, 16L, 16L, 16L, 16L, 16L, 16L), .Label = c("HeV", "NiV", 
"CedV", "GhV", "MojV", "MeV", "CDV", "FeMV", "MuV", "HPIV-2", 
"PIV5", "SeV", "HPIV1", "HPIV3", "APMV-1_NDV", "APMV-3", "AsaPV", 
"FDLV"), class = "factor"), Gene = structure(c(1L, 2L, 3L, 4L, 
5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 
3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 
1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 
5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 
3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 
1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L), .Label = c("N", 
"P", "M", "F", "RBP", "RdRp"), class = "factor"), Length = c(568, 
469, 200, 418, 516, 67, 586, 469, 200, 412, 504, 67, 334, 192, 
408, 88, 139, 63, 238, 213, 314, 82, 68, 65, 164, 144, 455, 173, 
543, 50, 59, 72, 426, 137, 84, 176, 59, 72, 407, 132, 111, 65, 
47, 98, 333, 344, 116, 150, 111, 74, 90, 48, 66, 137, 134, 187, 
130, 186, 210, 44, 106, 66, 204, 100, 111, 34, 43, 83, 94, 70, 
104, 85, 43, 113, 94, 88, 110, 100, 43, 122, 61, 38, 90, 71, 
217, 180, 112, 84, 195, 77, 167, 203, 67, 204, 247, 221), Use = c("Yes", 
"Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", 
"Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", 
"Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", 
"Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", 
"Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", 
"Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", 
"Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", 
"Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", 
"Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", 
"Yes", "Yes", "No", "No", "No", "No", "No", "No", "No", "No", 
"No", "No", "No", "No"), Colors = c("#A50F15", "#A50F15", "#A50F15", 
"#A50F15", "#A50F15", "#A50F15", "#DE2D26", "#DE2D26", "#DE2D26", 
"#DE2D26", "#DE2D26", "#DE2D26", "#FB6A4A", "#FB6A4A", "#FB6A4A", 
"#FB6A4A", "#FB6A4A", "#FB6A4A", "#FCAE91", "#FCAE91", "#FCAE91", 
"#FCAE91", "#FCAE91", "#FCAE91", "#FEE5D9", "#FEE5D9", "#FEE5D9", 
"#FEE5D9", "#FEE5D9", "#FEE5D9", "#3182BD", "#3182BD", "#3182BD", 
"#3182BD", "#3182BD", "#3182BD", "#9ECAE1", "#9ECAE1", "#9ECAE1", 
"#9ECAE1", "#9ECAE1", "#9ECAE1", "#DEEBF7", "#DEEBF7", "#DEEBF7", 
"#DEEBF7", "#DEEBF7", "#DEEBF7", "#31A354", "#31A354", "#31A354", 
"#31A354", "#31A354", "#31A354", "#A1D99B", "#A1D99B", "#A1D99B", 
"#A1D99B", "#A1D99B", "#A1D99B", "#E5F5E0", "#E5F5E0", "#E5F5E0", 
"#E5F5E0", "#E5F5E0", "#E5F5E0", "#756BB1", "#756BB1", "#756BB1", 
"#756BB1", "#756BB1", "#756BB1", "#BCBDDC", "#BCBDDC", "#BCBDDC", 
"#BCBDDC", "#BCBDDC", "#BCBDDC", "#EFEDF5", "#EFEDF5", "#EFEDF5", 
"#EFEDF5", "#EFEDF5", "#EFEDF5", "#E6550D", "#E6550D", "#E6550D", 
"#E6550D", "#E6550D", "#E6550D", "#FEE6CE", "#FEE6CE", "#FEE6CE", 
"#FEE6CE", "#FEE6CE", "#FEE6CE")), row.names = c(NA, -96L), class = c("tbl_df", 
"tbl", "data.frame"))

标签: rggplot2

解决方案


推荐阅读