首页 > 解决方案 > 如何按收集日期(x 轴)和其他因素(R)绘制阳性病例百分比(y 轴)?

问题描述

请帮忙! 我有需要尽快准备报告的案例数据,但无法正确显示图表。

从以 CollectionDate 作为案例“记录”的数据集(即具有相同日期的多行意味着当天更多案例),我想显示当天的阳性病例数/总(阳性 + 阴性)病例数作为百分比y 轴,收集日期沿 x 轴。然后我想按地区细分。目标是看起来像这样,但根据每日阳性/测试次数,而不仅仅是阳性与阴性。我还想在每个图表上添加一条 20% 的水平线。

    ggplot(df_final, aes(x =CollectionDate, fill = TestResult)) +
    geom_bar(aes(y=..prop..)) +
    scale_y_continuous(labels=percent_format())

这是,再次,关闭。但是百分比是错误的,因为它们只是将当天的比例与所有天数而不是每天数相比较。

然后我尝试tally()在以下命令中使用来尝试计算每个区域并聚合:

  df_final %>% 
  group_by(CollectionDate, Region, as.factor(TestResult)) %>% 
  filter(TestResult == "Positive") %>%
  tally()

我仍然无法正确绘制图表。建议?

快速浏览我的数据:

head(df_final)

标签: rggplot2

解决方案


我可以让你走到一半(请参阅代码中的注释以进行澄清)。此代码用于每个区域每天的计数(为每个区域单独绘制)。我认为您也可以进一步调整以计算每个县每天的计数;整个州应该是小菜一碟。祝你报告顺利。

rm(list = ls())

library(dplyr)
library(magrittr)
library(ggplot2)
library(scales)
library(tidyr) #Needed for the spread() function

#Dummy data
set.seed(1984)

sdate <- as.Date('2000-03-09')  
edate <- as.Date('2000-05-18')
dateslist <- as.Date(sample(as.numeric(sdate): as.numeric(edate), 10000, replace = TRUE), origin = '1970-01-01')

df_final <- data.frame(Region = rep_len(1:9, 10000), 
                 CollectionDate = dateslist, 
                 TestResult = sample(c("Positive", "Negative"), 10000, replace = TRUE))


#First tally the positve and negative cases
#by Region, CollectionDate, TestResult in that order
df_final %<>% 
  group_by(Region, CollectionDate, TestResult) %>%
  tally()


#Then
#First spread the counts (in n)
#That is, create separate columns for Negative and Positive cases
#for each Region-CollectionDate combination
#Then calculate their proportions (as shown)
#Now you have Negative and Positive 
#percentages by CollectionDate by Region
df_final %<>% 
  spread(key = TestResult, value = n) %>% 
  mutate(Negative = Negative/(Negative + Positive), 
         Positive = Positive/(Negative + Positive))



#Plotting this now
#Since the percentages are available already
#Use geom_col() instead of geom_bar()
df_final %>% ggplot() + 
  geom_col(aes(x = CollectionDate, y = Positive, fill = "Positive"), 
           position = "identity", alpha = 0.4) + 
  geom_col(aes(x = CollectionDate, y = Negative, fill = "Negative"), 
           position = "identity", alpha = 0.4) +
  facet_wrap(~ Region, nrow = 3, ncol = 3)

这产生: 绘图


推荐阅读