首页 > 解决方案 > 在散点图中按名称更改颜色

问题描述

我正在尝试显示每批产品的产量散点图:(该数据集的组成)

屈服
PROD_191105001 88
PROD_191106002 87
PROD_191107003 86
PROD_200203001 98
PROD_200204002 99
PROD_200205003 96

现在,我有一个很好的散点图,显示了平均和质量图表规则(第 95 个百分位和第 99 个百分位)。我想将去年的批次与今年的批次进行比较。去年批次的点应该是灰色的: 散点图产量

我为那张图片做的意大利面:

ggplot(data=data) +
+ geom_point(aes(x=Batch, y=`normalized yield`))+
+ geom_hline(aes(yintercept = 115), colour = "green", size=1)+ %%set limit
+ geom_hline(aes(yintercept = 140), colour = "green", size=1)+ %%set limit
+ geom_hline(aes(yintercept=(mean(`normalized yield`)-2*sd(`normalized yield`))), colour = "blue", size=1)+
+ geom_hline(aes(yintercept=(mean(`normalized yield`)+2*sd(`normalized yield`))), colour = "blue", size=1)+
+ geom_hline(aes(yintercept=(mean(`normalized yield`)-3*sd(`normalized yield`))), colour = "red", size=1)+
+ geom_hline(aes(yintercept=mean(`normalized yield`)), linetype = "dashed")+
+ theme(axis.ticks = element_blank(), axis.text.x = element_blank())+
+ labs(title="Product Yield ", y="normalized yield in g", x= "Batch")

如何告诉 R 在特定点拆分此数据集,以便更改该部分的颜色构成?

标签: rggplot2

解决方案


超级快速版本:使用color带有布尔值“小于”的审美选项Batch(注意添加colourgeom_point()

ggplot(data=data) +
  geom_point(aes(x=Batch, y=`normalized yield`, colour = Batch < 'PROD_20200101'))
  theme(axis.ticks = element_blank(), axis.text.x = element_blank())+
  labs(title="Product Yield ", y="normalized yield in g", x= "Batch")

更灵活的方法是将列中的日期提取Batch到它自己的列中,以便您可以直接引用它(例如,为绘图着色)

library(ggplot2)  # For plotting
library(tibble)  # To make the sample data
library(tidyverse)  # For piping and data transformation
library(lubridate)  # To handle dates

# Sample Data -- I aligned the sample data to 2001 / 2002
data <- tribble(
  ~batch, ~yield,
  'PROD_20011001', 88,
  'PROD_20011102', 87,
  'PROD_20011203', 86,
  'PROD_20020101', 98,
  'PROD_20020202', 99,
  'PROD_20020303', 96,
)

# Create new columns
# Extract date_string, then date, then year
# You could do this all in one step if you want,
# I used three steps to help illustrate what is happening.
data <- data %>%
  mutate(
    batch_date_str = substring(batch, 6),  # Extract the characters that represent the date from each batch
    batch_date = lubridate::ymd(batch_date_str),  # Convert those characters into a 'date'
    batch_year = lubridate::year(batch_date)  # Extract just the year from each batch date
  )

# If you want to color each year distinctly, convert `batch_year` to a factor
# so that ggplot does not use a continuous scale.
ggplot(data, aes(x=batch_date, y=yield, color=as_factor(batch_year))) +
  geom_point()

推荐阅读