r - 将 ggplot 与非数字数据一起使用
问题描述
我的数据集(在底部)描述了对特定疾病“Bd”有抵抗力或易感的各种青蛙物种的保护状况。问题是,可用的数据不多(例如,许多基因组序列缺失)。我想看看缺乏数据可用性是否与保护状态或疾病易感性相关(如果青蛙死得太快,我们将没有数据!)使用以下命令将某些列从字符更改为因素:
df <- mutate_at(df, vars(IUCN_Red_List_status, Bd_resistance, Genome_availability, Phylum, Last.assessed), as.factor)
我想可视化并比较哪些物种正在增加/稳定/下降(IUCN_Red_List_status 列),哪些物种在 Genome_availability 和 GeneID 列中具有 NA,哪些物种具有 Bd 抗性(N = 否,Y = 是,U = 未知)。我尝试了下面的命令,但图像没有帮助。问题是,这些数据都不是数字,所以我正在努力创建任何图表。有人可以帮我开始这个项目吗?
ggplot(data = df) + geom_bar(mapping = aes(x = Species, group = IUCN_Red_List_status))
structure(list(Species = c("Lithobates catesbeianus", "Lithobates sylvatica",
"Xenopus laevis", "Xenopus tropicalis", "Pyxicephalus adspersus",
"Nanorana parkeri", "Rhinella marina", "Rana temporaria", "Oophaga pumilio",
"Amolops mantzorum", "Rana saharica", "Pseudacris regilla", "Babina daunchina",
"Odorrana schmackeri"), Phylum = structure(c(2L, 2L, 2L, 2L,
2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Anura ",
"Chordata"), class = "factor"), IUCN_Red_List_status = structure(c(2L,
3L, 2L, 3L, 1L, 3L, 2L, 3L, 1L, 1L, 3L, 3L, 3L, 1L), .Label = c("Decreasing",
"Increasing", "Stable"), class = "factor"), Last.assessed = structure(c(6L,
6L, 3L, 7L, 4L, 1L, 3L, 3L, 5L, 8L, 3L, 1L, 8L, 2L), .Label = c("2004",
"2005", "2008", "2013", "2014", "2015", "2018", "2019"), class = "factor"),
National_status = c("LC", "LC", "LC", "LC", "NT", "LC", "LC",
"LC", "U", "U", "LC", "LC", "LC", "NT"), Bd_resistance = structure(c(3L,
1L, 3L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L), .Label = c("N",
"U", "Y"), class = "factor"), Genome_availability = structure(c(2L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("N",
"Y"), class = "factor"), Gene_name = c("ranalexin", NA, NA,
"NPC intracellular cholesterol transporter 1", NA, "uncharacterized LOC",
NA, "temporin G precursor", NA, "pleurain-B-MT1 antimicrobial peptide precursor",
"temporin-SHe precursor", "temporin-3PR precursor", "peptide-DN4",
"breinin-1S precursor"), GeneID = c("S69903.1", NA, NA, "XM_004915212.4",
NA, "XR_001941307.1", NA, "Y09395.1", NA, "HQ128621.1", "FN557008.1",
"JQ511833.1", "Q1286626.1", "AJ971790.1"), Score = c(435,
NA, NA, 43.7, NA, 98.7, NA, 288, NA, 229, 255, 266, 224,
231), E.value = c(8e-118, NA, NA, 0.008, NA, 3e-20, NA, 4e-79,
NA, 8e-62, 9e-70, 2e-61, 2e-61, 4e-62), Identities = c("268/283",
NA, NA, "41/52", NA, "93/119", NA, "237/281", NA, "146/155",
"171/187", "147/158", "145/155", "141/146"), Identities.percent = c(95L,
NA, NA, 79L, NA, 78L, NA, 84L, NA, 94L, 91L, 93L, 94L, 97L
), Gaps = c("6/283", NA, NA, " 3/52", NA, "0/119", NA, "11/281",
NA, "3/155", "3/187", "4/158", "3/155", "3/146"), Gaps.percent = c(2L,
NA, NA, 5L, NA, 0L, NA, 3L, NA, 1L, 1L, 2L, 1L, 2L)), class = "data.frame", row.names = c(NA,
-14L))
解决方案
这个问题基本上归结为“如何在二维图上绘制 4 维因子数据?”。一个答案是使用 x 轴、y 轴、填充比例和刻面来表示您正在绘制的 4 个维度:
df$missing_geneId <- "Has Gene Info"
df$missing_geneId[is.na(df$GeneID)] <- "Missing Gene Info"
ggplot(df, aes(Bd_resistance, Species, fill = IUCN_Red_List_status,
colour = IUCN_Red_List_status)) +
geom_tile(alpha = 0.5) +
facet_grid(. ~ missing_geneId) +
theme_bw()
推荐阅读
- java - Eclipse 默认 equals() 实现的问题
- c# - ASP.NET Core 返回 HTTP 响应并继续使用相同上下文的后台工作程序
- fortran - 如何通过Fortran实现对哈密顿矩阵期望值的计算
- reactjs - 在反应中将Formik表单数据提交到firebase数据库
- c++ - 为什么在实现链接列表时不能“从头删除”正常工作?
- python-3.x - 导入opencv时出现“原因:找不到图像”
- javascript - javascript中import * as module和import module有什么区别
- python - 应用迁移后启动功能
- php - 我做的路线突然加了“嗯”
- azure - 如何在 APIM 中添加绑定处理策略?