首页 > 解决方案 > 将 ggplot 与非数字数据一起使用

问题描述

我的数据集(在底部)描述了对特定疾病“Bd”有抵抗力或易感的各种青蛙物种的保护状况。问题是,可用的数据不多(例如,许多基因组序列缺失)。我想看看缺乏数据可用性是否与保护状态或疾病易感性相关(如果青蛙死得太快,我们将没有数据!)使用以下命令将某些列从字符更改为因素:

df <- mutate_at(df, vars(IUCN_Red_List_status, Bd_resistance, Genome_availability, Phylum, Last.assessed), as.factor)

我想可视化并比较哪些物种正在增加/稳定/下降(IUCN_Red_List_status 列),哪些物种在 Genome_availability 和 GeneID 列中具有 NA,哪些物种具有 Bd 抗性(N = 否,Y = 是,U = 未知)。我尝试了下面的命令,但图像没有帮助。问题是,这些数据都不是数字,所以我正在努力创建任何图表。有人可以帮我开始这个项目吗?

ggplot(data = df) + geom_bar(mapping = aes(x = Species, group = IUCN_Red_List_status))



structure(list(Species = c("Lithobates catesbeianus", "Lithobates sylvatica", 
"Xenopus laevis", "Xenopus tropicalis", "Pyxicephalus adspersus", 
"Nanorana parkeri", "Rhinella marina", "Rana temporaria", "Oophaga pumilio", 
"Amolops mantzorum", "Rana saharica", "Pseudacris regilla", "Babina daunchina", 
"Odorrana schmackeri"), Phylum = structure(c(2L, 2L, 2L, 2L, 
2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Anura ", 
"Chordata"), class = "factor"), IUCN_Red_List_status = structure(c(2L, 
3L, 2L, 3L, 1L, 3L, 2L, 3L, 1L, 1L, 3L, 3L, 3L, 1L), .Label = c("Decreasing", 
"Increasing", "Stable"), class = "factor"), Last.assessed = structure(c(6L, 
6L, 3L, 7L, 4L, 1L, 3L, 3L, 5L, 8L, 3L, 1L, 8L, 2L), .Label = c("2004", 
"2005", "2008", "2013", "2014", "2015", "2018", "2019"), class = "factor"), 
    National_status = c("LC", "LC", "LC", "LC", "NT", "LC", "LC", 
    "LC", "U", "U", "LC", "LC", "LC", "NT"), Bd_resistance = structure(c(3L, 
    1L, 3L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L), .Label = c("N", 
    "U", "Y"), class = "factor"), Genome_availability = structure(c(2L, 
    1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("N", 
    "Y"), class = "factor"), Gene_name = c("ranalexin", NA, NA, 
    "NPC intracellular cholesterol transporter 1", NA, "uncharacterized LOC", 
    NA, "temporin G precursor", NA, "pleurain-B-MT1 antimicrobial peptide precursor", 
    "temporin-SHe precursor", "temporin-3PR precursor", "peptide-DN4", 
    "breinin-1S precursor"), GeneID = c("S69903.1", NA, NA, "XM_004915212.4", 
    NA, "XR_001941307.1", NA, "Y09395.1", NA, "HQ128621.1", "FN557008.1", 
    "JQ511833.1", "Q1286626.1", "AJ971790.1"), Score = c(435, 
    NA, NA, 43.7, NA, 98.7, NA, 288, NA, 229, 255, 266, 224, 
    231), E.value = c(8e-118, NA, NA, 0.008, NA, 3e-20, NA, 4e-79, 
    NA, 8e-62, 9e-70, 2e-61, 2e-61, 4e-62), Identities = c("268/283", 
    NA, NA, "41/52", NA, "93/119", NA, "237/281", NA, "146/155", 
    "171/187", "147/158", "145/155", "141/146"), Identities.percent = c(95L, 
    NA, NA, 79L, NA, 78L, NA, 84L, NA, 94L, 91L, 93L, 94L, 97L
    ), Gaps = c("6/283", NA, NA, " 3/52", NA, "0/119", NA, "11/281", 
    NA, "3/155", "3/187", "4/158", "3/155", "3/146"), Gaps.percent = c(2L, 
    NA, NA, 5L, NA, 0L, NA, 3L, NA, 1L, 1L, 2L, 1L, 2L)), class = "data.frame", row.names = c(NA, 
-14L))

标签: r

解决方案


这个问题基本上归结为“如何在二维图上绘制 4 维因子数据?”。一个答案是使用 x 轴、y 轴、填充比例和刻面来表示您正在绘制的 4 个维度:

df$missing_geneId <- "Has Gene Info"
df$missing_geneId[is.na(df$GeneID)] <- "Missing Gene Info"

ggplot(df, aes(Bd_resistance, Species, fill = IUCN_Red_List_status,
               colour = IUCN_Red_List_status)) + 
  geom_tile(alpha = 0.5) + 
  facet_grid(. ~ missing_geneId) +
  theme_bw()

在此处输入图像描述


推荐阅读