首页 > 解决方案 > 我怎样才能将这些数据点清楚地添加到图表中,而不会让它们看起来像它们那样?

问题描述

我有两个数据框(dput()s 在这个问题的末尾),我希望将它们绘制到同一个图表上。

我希望能够显示任何给定日期(列)的第一次和第二次预约号码,以及每个日期给予的每种疫苗接种的数量,按地点细分。我已经执行了一个count原始数据(使用dplyr),但我认为通过每天按站点绘制,它导致我的图表显示堆叠值而不是单个/总值:

在此处输入图像描述

我高度怀疑我的处理方法是错误的,这就是导致列和行看起来像它们的方式的原因;它在许多层面上似乎都是错误的。

我认为这些列被分解成段(因为它们是许多值的组合),所有这些都堆叠在一起,我相信这条线也是如此。

就这一行而言,显然有问题,因为它似乎从一列跳到下一列;没有平滑/流畅的过渡。我已将数据按单日值拆分,但这仍然会发生。

(为了这个例子,我添加了粗体颜色;这个图表不是最终形式。)

我尝试使用merge来组合数据集,但仍然收到相同的结果;我确信有更好的方法来做到这一点。

任何建议都会很棒。

merge数据帧的代码:

merged <- merge(df, df2, by = 1)
colnames(merged)[1] <- "apptDTS" # Change first column name

图表代码:

ggplot(merged) +
geom_col(aes(apptDTS, n.x), fill = "yellow", colour = "black") +
geom_col(aes(apptDTS, n.y), fill = "blue", colour = "black") +
geom_line(aes(x = apptDTS, y = n.x),
          colour = "green") +
geom_line(aes(x = apptDTS, y = n.y),
          colour = "red")

dput年代:

df <- structure(list(FirstApptDTS = structure(c(1609718400, 1609718400, 
1609718400, 1609718400, 1609804800, 1609804800, 1609804800, 1609804800, 
1609891200, 1609891200, 1609891200, 1609891200, 1609977600, 1609977600, 
1609977600, 1609977600, 1610064000, 1610064000, 1610064000, 1610064000, 
1610150400, 1610150400, 1610150400, 1610150400, 1610409600, 1610409600, 
1610409600, 1610409600, 1610409600, 1610496000, 1610496000, 1610496000, 
1610496000, 1610496000, 1610582400, 1610582400, 1610582400, 1610582400, 
1610582400, 1610668800, 1610668800, 1610668800, 1610668800, 1610668800, 
1610755200, 1610755200, 1610755200, 1610755200, 1610755200, 1610928000, 
1610928000, 1610928000, 1610928000, 1610928000, 1610928000, 1611014400, 
1611014400, 1611014400, 1611014400, 1611014400, 1611014400, 1611100800, 
1611100800, 1611100800, 1611100800, 1611100800, 1611100800, 1611187200, 
1611187200, 1611187200, 1611187200, 1611187200, 1611273600, 1611273600, 
1611273600, 1611273600, 1611273600, 1611360000, 1611360000, 1611360000, 
1611360000, 1611360000, 1611360000, 1611532800, 1611532800, 1611532800, 
1611532800, 1611532800, 1611532800, 1611532800, 1611619200, 1611619200, 
1611619200, 1611619200, 1611619200, 1611705600, 1611705600, 1611705600, 
1611705600, 1611705600, 1611792000, 1611792000, 1611792000, 1611792000, 
1611792000, 1611878400, 1611878400, 1611878400, 1611878400, 1611878400, 
1611964800, 1611964800, 1611964800, 1611964800, 1611964800), class = c("POSIXct", 
"POSIXt"), tzone = ""), firstSiteLocation = c("GHGA", "LBVC1", 
"STHSTVC", "STHSTVC", "GHGA", "LBVC1", "STHSTVC", "STHSTVC", 
"GHGA", "LBVC1", "STHSTVC", "STHSTVC", "GHGA", "LBVC1", "STHSTVC", 
"STHSTVC", "GHGA", "LBVC1", "STHSTVC", "STHSTVC", "GHGA", "LBVC1", 
"STHSTVC", "STHSTVC", "GHGA", "LBVC1", "LBVC2", "STHSTVC", "STHSTVC", 
"GHGA", "LBVC1", "LBVC2", "STHSTVC", "STHSTVC", "GHGA", "LBVC1", 
"LBVC2", "STHSTVC", "STHSTVC", "GHGA", "LBVC1", "LBVC2", "STHSTVC", 
"STHSTVC", "GHGA", "LBVC1", "LBVC2", "STHSTVC", "STHSTVC", "GHGA", 
"LBVC1", "LBVC2", "STHSTVC", "STHSTVC", "WBVC1", "GHGA", "LBVC1", 
"LBVC2", "STHSTVC", "STHSTVC", "WBVC1", "GHGA", "LBVC1", "LBVC2", 
"STHSTVC", "STHSTVC", "WBVC1", "GHGA", "LBVC1", "LBVC2", "STHSTVC", 
"WBVC1", "GHGA", "LBVC1", "LBVC2", "STHSTVC", "WBVC1", "GHGA", 
"LBVC1", "LBVC2", "STHSTVC", "STHSTVC", "WBVC1", "GHGA", "LBVC1", 
"LBVC2", "STHSTVC", "STHSTVC", "VC2", "WBVC1", "GHGA", "LBVC1", 
"LBVC2", "STHSTVC", "WBVC1", "GHGA", "LBVC1", "LBVC2", "STHSTVC", 
"WBVC1", "GHGA", "LBVC1", "LBVC2", "STHSTVC", "WBVC1", "GHGA", 
"LBVC1", "LBVC2", "STHSTVC", "WBVC1", "GHGA", "LBVC1", "LBVC2", 
"STHSTVC", "WBVC1"), VaccineTypeCD = c("DEF", "DEF", "ABC", "DEF", 
"DEF", "DEF", "ABC", "DEF", "DEF", "DEF", "ABC", "DEF", "DEF", 
"DEF", "ABC", "DEF", "DEF", "DEF", "ABC", "DEF", "DEF", "DEF", 
"ABC", "DEF", "DEF", "DEF", "DEF", "ABC", "DEF", "DEF", "DEF", 
"DEF", "ABC", "DEF", "DEF", "DEF", "DEF", "ABC", "DEF", "DEF", 
"DEF", "DEF", "ABC", "DEF", "DEF", "DEF", "DEF", "ABC", "DEF", 
"DEF", "DEF", "DEF", "ABC", "DEF", "DEF", "DEF", "DEF", "DEF", 
"ABC", "DEF", "DEF", "DEF", "DEF", "DEF", "ABC", "DEF", "DEF", 
"DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", 
"DEF", "DEF", "DEF", "DEF", "ABC", "DEF", "DEF", "DEF", "DEF", 
"DEF", "ABC", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", 
"DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", 
"DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", 
"DEF", "DEF", "DEF"), n = c(134L, 283L, 3L, 10L, 122L, 120L, 
18L, 128L, 148L, 534L, 481L, 22L, 151L, 520L, 529L, 7L, 174L, 
539L, 535L, 3L, 185L, 540L, 494L, 3L, 91L, 321L, 491L, 12L, 495L, 
82L, 329L, 493L, 6L, 534L, 86L, 423L, 517L, 2L, 496L, 111L, 394L, 
505L, 2L, 498L, 401L, 547L, 518L, 2L, 362L, 443L, 481L, 555L, 
1L, 524L, 153L, 446L, 452L, 493L, 1L, 426L, 288L, 472L, 463L, 
558L, 1L, 381L, 317L, 491L, 592L, 610L, 566L, 471L, 496L, 606L, 
615L, 572L, 561L, 472L, 564L, 557L, 1L, 577L, 584L, 534L, 598L, 
570L, 1L, 594L, 1L, 553L, 492L, 581L, 570L, 610L, 573L, 484L, 
580L, 575L, 571L, 554L, 482L, 590L, 596L, 533L, 395L, 489L, 570L, 
606L, 486L, 413L, 495L, 497L, 538L, 441L, 264L)), row.names = c(59L, 
61L, 63L, 64L, 66L, 68L, 70L, 71L, 73L, 74L, 76L, 77L, 79L, 81L, 
83L, 84L, 86L, 88L, 90L, 91L, 93L, 95L, 97L, 98L, 109L, 111L, 
113L, 115L, 116L, 118L, 120L, 122L, 124L, 125L, 127L, 129L, 131L, 
133L, 134L, 136L, 138L, 140L, 142L, 143L, 145L, 147L, 149L, 151L, 
152L, 154L, 156L, 158L, 160L, 161L, 163L, 165L, 167L, 169L, 171L, 
172L, 174L, 176L, 178L, 180L, 182L, 183L, 185L, 187L, 189L, 191L, 
193L, 195L, 197L, 199L, 201L, 203L, 205L, 207L, 209L, 211L, 213L, 
214L, 216L, 218L, 220L, 222L, 224L, 225L, 228L, 229L, 231L, 233L, 
235L, 237L, 239L, 241L, 243L, 245L, 247L, 249L, 251L, 253L, 255L, 
257L, 259L, 261L, 263L, 265L, 267L, 269L, 271L, 273L, 275L, 277L, 
279L), class = "data.frame")

df2 <- structure(list(SecondApptDTS = structure(c(1609545600, 1609804800, 
1609891200, 1609977600, 1610064000, 1610150400, 1610409600, 1610409600, 
1610496000, 1610496000, 1610496000, 1610582400, 1610582400, 1610668800, 
1610668800, 1610668800, 1610755200, 1611014400, 1611187200, 1611705600, 
1611878400, 1611964800, NA), class = c("POSIXct", "POSIXt"), tzone = ""), 
    secondSiteLocation = c("GHGA", "GHGA", "GHGA", "GHGA", "GHGA", 
    "GHGA", "GHGA", "LBVC1", "GHGA", "LBVC1", "STHSTVC", "GHGA", 
    "LBVC1", "GHGA", "LBVC1", "LBVC2", "GHGA", "LBVC1", "GHGA", 
    "GHGA", "STHSTVC", "GHGA", NA), VaccineType2CD = c("DEF", 
    "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", 
    "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", "DEF", 
    "DEF", "DEF", "DEF", NA), n = c(1L, 1L, 254L, 199L, 274L, 
    269L, 325L, 157L, 284L, 197L, 2L, 295L, 123L, 257L, 123L, 
    1L, 1L, 1L, 4L, 2L, 1L, 3L, NA)), row.names = c("24", "28", 
"31", "34", "37", "40", "47", "49", "51", "53", "55", "57", "59", 
"62", "64", "66", "67", "68", "73", "75", "77", "78", "NA"), class = "data.frame")

标签: rggplot2

解决方案


如果我理解正确,OP想要显示

  • 任何给定日期的第一个和第二个约会号码
  • 每个日期接种的每种疫苗的数量
  • 按位置细分。

但是,我不确定我是否完全理解了这些要求。因此,我的回答可能需要根据 OP 的反馈进行调整。

以下是我会用我喜欢的工具做的事情(我更熟悉和更快data.tabledplyrmerge()最重要的是,我不rbind()输入第一次和第二次约会的 id 列的数据集。

library(data.table)
library(magrittr)
cols <- c("appDTS", "siteLocation", "vaccineType", "n")
combi <- list(df, df2) %>% 
  lapply(setDT) %>% 
  lapply(setnames, cols) %>% 
  rbindlist(idcol = "appt") %>%
  .[, appt := factor(appt, labels = c("First", "Second"))]

# 1st plot
ggplot(combi) + 
  aes(appDTS, n, fill = appt) + 
  geom_col() +
  scale_fill_brewer(palette = "Paired")

在此处输入图像描述

# 2nd plot
ggplot(combi) + 
  aes(appDTS, n, fill = vaccineType) + 
  geom_col() +
  scale_fill_brewer(palette = "Accent")

在此处输入图像描述

# 3rd plot
ggplot(combi) + 
  aes(appDTS, n, fill = siteLocation) + 
  geom_col()

在此处输入图像描述

请注意,我为每个图选择了不同的调色板,以可视化不同的变量是彩色编码的。

编辑

OP已澄清

我想要一个图,它在 x 轴上显示日期,在 y 轴上显示一个计数,并带有条形图,还有两条线表示每天接种了多少疫苗。

为了绘制每天接种的疫苗数量,我们需要进一步汇总数据。data.table这是由

combi[!is.na(n), .(n = sum(n)), by = .(appDTS, vaccineType)]

现在,可以通过以下方式创建带有线条叠加的图

ggplot(combi) + 
  aes(appDTS, n, fill = appt) + 
  geom_col() +
  scale_fill_brewer(palette = "Paired") + 
  geom_line(
    aes(appDTS, n, colour = vaccineType),
    data = combi[!is.na(n), .(n = sum(n)), by = .(appDTS, vaccineType)],
    inherit.aes = FALSE, size = 1) +
  scale_color_brewer(palette = "Set1")

在此处输入图像描述

inherit.aes = FALSE需要避免由于聚合数据集中缺少appt变量(映射在fill美学上)而导致的错误消息。


推荐阅读