首页 > 解决方案 > 当我对它们进行重新排序时,一个因子级别变为 NA,这是为什么呢?

问题描述

我有以下数据:

> dataAvg
# A tibble: 20 x 3
# Groups:   Date [5]
   Date  Rate   meanNitrogen
   <fct> <fct>         <dbl>
 1 7.16  Rate 1         1.36
 2 7.16  Rate 2         1.29
 3 7.16  Rate 3         1.40
 4 7.16  Rate 4         1.11
 5 7.22  Rate 1         1.41
 6 7.22  Rate 2         1.34
 7 7.22  Rate 3         1.62
 8 7.22  Rate 4         1.08
 9 7.29  Rate 1         1.38
10 7.29  Rate 2         1.39
11 7.29  Rate 3         1.51
12 7.29  Rate 4         1.14
13 7.8   Rate 1         1.34
14 7.8   Rate 2         1.38
15 7.8   Rate 3         1.38
16 7.8   Rate 4         1.08
17 8.05  Rate 1         1.39
18 8.05  Rate 2         1.35
19 8.05  Rate 3         1.42
20 8.05  Rate 4         1.02

我正在尝试制作以下ggplot:

ggplot(dataAvg, aes(x=Date, y=meanNitrogen, group=Rate)) + 
  geom_bar(stat="identity") + 
  facet_wrap(.~Rate)

但是,日期(一个因素)是按字母顺序而不是按时间顺序读取的。为了改变这一点,我添加了以下代码行:

dataAvg$Date <- factor(dataAvg$Date,levels(dataAvg$Date)[c(4,1,2,3,5)])

这是更改顺序之前的输出:

structure(list(Date = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 
6L), .Label = c("7.1", "7.16", "7.22", "7.29", "7.8", "8.05", 
"8.18"), class = "factor"), Rate = structure(c(1L, 2L, 3L, 4L, 
1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 
1L, 2L, 3L, 4L), .Label = c("Rate 1", "Rate 2", "Rate 3", "Rate 4"
), class = "factor"), meanNitrogen = c(4.955, 5.005, 5.1075, 
4.01, 6.3325, 5.485, 6.1825, 4.2275, 5.195, 4.825, 5.325, 3.765, 
5.0225, 4.93, 5.3925, 3.82, 5.2225, 5.34, 5.2025, 4.0225, 4.43, 
4.3775, 4.725, 3.7025)), row.names = c(NA, -24L), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), groups = structure(list(Date = structure(1:6, .Label = c("7.1", 
"7.16", "7.22", "7.29", "7.8", "8.05", "8.18"), class = "factor"), 
    .rows = list(1:4, 5:8, 9:12, 13:16, 17:20, 21:24)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE))

这是之后的输出:

 > dput(dataAvg)
structure(list(Date = structure(c(1L, 1L, 1L, 1L, 3L, 3L, 3L, 
3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 2L, 2L, 2L, 2L, 6L, 6L, 6L, 
6L), .Label = c("7.1", "7.8", "7.16", "7.22", "7.29", "8.05"), class = "factor"), 
    Rate = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 
    3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("Rate 1", 
    "Rate 2", "Rate 3", "Rate 4"), class = "factor"), meanNitrogen = c(4.955, 
    5.005, 5.1075, 4.01, 6.3325, 5.485, 6.1825, 4.2275, 5.195, 
    4.825, 5.325, 3.765, 5.0225, 4.93, 5.3925, 3.82, 5.2225, 
    5.34, 5.2025, 4.0225, 4.43, 4.3775, 4.725, 3.7025)), row.names = c(NA, 
-24L), groups = structure(list(Date = structure(1:6, .Label = c("7.1", 
"7.16", "7.22", "7.29", "7.8", "8.05", "8.18"), class = "factor"), 
    .rows = list(1:4, 5:8, 9:12, 13:16, 17:20, 21:24)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

在其他情况下,这已经解决了这个问题,但是,在这里我丢失了 ggplot 中的“8.05”日期。日期被替换为“NA”值。在 stackoverflow 或其他地方搜索时,我找不到解决方案。任何摆脱 NA 的帮助将不胜感激。谢谢!

标签: r

解决方案


我将提出一些建议,这些建议并不能像书面回答你的问题,但我认为可能会改善数据可视化。看看你是否同意。

  1. 如果您有日期,则date用作变量类型
  2. 绘制线(或点)而不是列

假设您的日期年份是 2020 并且当前格式是month.day,您可以使用以下方法转换它们dplyr::mutate

library(dplyr)
library(ggplot2)

dataAvg %>% 
  mutate(newDate = as.Date(paste0(Date, ".2020"), "%m.%d.%Y")) %>% 
  ggplot(aes(newDate, meanNitrogen)) + 
  geom_line() + 
  facet_wrap(~Rate)

结果:

在此处输入图像描述

编辑:由于您的重点是按给定日期的比率进行比较,因此更好的折线图将按比率着色,而不是使用构面。

dataAvg %>% 
  mutate(newDate = as.Date(paste0(Date, ".2020"), "%m.%d.%Y")) %>%
  ggplot(aes(newDate, meanNitrogen)) + 
  geom_line(aes(color = Rate))

在此处输入图像描述

或者,如果您认为列更清晰:

dataAvg %>% 
  mutate(newDate = as.Date(paste0(Date, ".2020"), "%m.%d.%Y")) %>%
  ggplot(aes(newDate, meanNitrogen)) + 
  geom_col(aes(fill = Rate), position = position_dodge())

在此处输入图像描述


推荐阅读