r - 当我对它们进行重新排序时,一个因子级别变为 NA,这是为什么呢?
问题描述
我有以下数据:
> dataAvg
# A tibble: 20 x 3
# Groups: Date [5]
Date Rate meanNitrogen
<fct> <fct> <dbl>
1 7.16 Rate 1 1.36
2 7.16 Rate 2 1.29
3 7.16 Rate 3 1.40
4 7.16 Rate 4 1.11
5 7.22 Rate 1 1.41
6 7.22 Rate 2 1.34
7 7.22 Rate 3 1.62
8 7.22 Rate 4 1.08
9 7.29 Rate 1 1.38
10 7.29 Rate 2 1.39
11 7.29 Rate 3 1.51
12 7.29 Rate 4 1.14
13 7.8 Rate 1 1.34
14 7.8 Rate 2 1.38
15 7.8 Rate 3 1.38
16 7.8 Rate 4 1.08
17 8.05 Rate 1 1.39
18 8.05 Rate 2 1.35
19 8.05 Rate 3 1.42
20 8.05 Rate 4 1.02
我正在尝试制作以下ggplot:
ggplot(dataAvg, aes(x=Date, y=meanNitrogen, group=Rate)) +
geom_bar(stat="identity") +
facet_wrap(.~Rate)
但是,日期(一个因素)是按字母顺序而不是按时间顺序读取的。为了改变这一点,我添加了以下代码行:
dataAvg$Date <- factor(dataAvg$Date,levels(dataAvg$Date)[c(4,1,2,3,5)])
这是更改顺序之前的输出:
structure(list(Date = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L,
6L), .Label = c("7.1", "7.16", "7.22", "7.29", "7.8", "8.05",
"8.18"), class = "factor"), Rate = structure(c(1L, 2L, 3L, 4L,
1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L,
1L, 2L, 3L, 4L), .Label = c("Rate 1", "Rate 2", "Rate 3", "Rate 4"
), class = "factor"), meanNitrogen = c(4.955, 5.005, 5.1075,
4.01, 6.3325, 5.485, 6.1825, 4.2275, 5.195, 4.825, 5.325, 3.765,
5.0225, 4.93, 5.3925, 3.82, 5.2225, 5.34, 5.2025, 4.0225, 4.43,
4.3775, 4.725, 3.7025)), row.names = c(NA, -24L), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), groups = structure(list(Date = structure(1:6, .Label = c("7.1",
"7.16", "7.22", "7.29", "7.8", "8.05", "8.18"), class = "factor"),
.rows = list(1:4, 5:8, 9:12, 13:16, 17:20, 21:24)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE))
这是之后的输出:
> dput(dataAvg)
structure(list(Date = structure(c(1L, 1L, 1L, 1L, 3L, 3L, 3L,
3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 2L, 2L, 2L, 2L, 6L, 6L, 6L,
6L), .Label = c("7.1", "7.8", "7.16", "7.22", "7.29", "8.05"), class = "factor"),
Rate = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L,
3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("Rate 1",
"Rate 2", "Rate 3", "Rate 4"), class = "factor"), meanNitrogen = c(4.955,
5.005, 5.1075, 4.01, 6.3325, 5.485, 6.1825, 4.2275, 5.195,
4.825, 5.325, 3.765, 5.0225, 4.93, 5.3925, 3.82, 5.2225,
5.34, 5.2025, 4.0225, 4.43, 4.3775, 4.725, 3.7025)), row.names = c(NA,
-24L), groups = structure(list(Date = structure(1:6, .Label = c("7.1",
"7.16", "7.22", "7.29", "7.8", "8.05", "8.18"), class = "factor"),
.rows = list(1:4, 5:8, 9:12, 13:16, 17:20, 21:24)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
在其他情况下,这已经解决了这个问题,但是,在这里我丢失了 ggplot 中的“8.05”日期。日期被替换为“NA”值。在 stackoverflow 或其他地方搜索时,我找不到解决方案。任何摆脱 NA 的帮助将不胜感激。谢谢!
解决方案
我将提出一些建议,这些建议并不能像书面回答你的问题,但我认为可能会改善数据可视化。看看你是否同意。
- 如果您有日期,则
date
用作变量类型 - 绘制线(或点)而不是列
假设您的日期年份是 2020 并且当前格式是month.day
,您可以使用以下方法转换它们dplyr::mutate
:
library(dplyr)
library(ggplot2)
dataAvg %>%
mutate(newDate = as.Date(paste0(Date, ".2020"), "%m.%d.%Y")) %>%
ggplot(aes(newDate, meanNitrogen)) +
geom_line() +
facet_wrap(~Rate)
结果:
编辑:由于您的重点是按给定日期的比率进行比较,因此更好的折线图将按比率着色,而不是使用构面。
dataAvg %>%
mutate(newDate = as.Date(paste0(Date, ".2020"), "%m.%d.%Y")) %>%
ggplot(aes(newDate, meanNitrogen)) +
geom_line(aes(color = Rate))
或者,如果您认为列更清晰:
dataAvg %>%
mutate(newDate = as.Date(paste0(Date, ".2020"), "%m.%d.%Y")) %>%
ggplot(aes(newDate, meanNitrogen)) +
geom_col(aes(fill = Rate), position = position_dodge())
推荐阅读
- android - 在 Android 上的 SQLite 数据库中实现相机图像存储的问题
- javascript - 选择特定选项时显示 div
- eclipse - 即使我在 Eclipse IDE 中安装了 Windowbuilder 也无法创建 JFrame
- neural-network - OCR 软件或自制 CNN 用于文档处理?
- ubuntu - vscode 没有运行并给出分段错误
- c# - 将“03/2020”之类的字符串解析为 DateTime 变量
- react-native - 使用 react native 的可拖动组件的故障 - 使用 Animated 和 PanResponder 实现
- php - 在 php 脚本中输入类型单选 + 文本区域
- python - python组合以逗号分隔的特定键的值
- python - '/' 理解为浮点数?