r - How do i use R to input a number of grouping values and recieve an output of observations in a 8x6 grid?
问题描述
I'm a newbie to the world of coding/programming. Loving the challenge so far but I've hit a bit of a roadblock.
I'm attempting to automate genotyping preparation for my job as a lab technician.
Unnecessary background:
I take care of a colony of 500-600 mice with 40-50 given genotype constructs at any given time. Whenever I get new litters (depending on the genotype of the parents) I have to extract DNA and confirm the genotype of the offspring. The task and the challenge of trouble-shooting the process were fun at first but it's getting very mundane and repetitive now. So I've started using R to automate certain parts of my job.
TL/DR: My job is getting repetitive, I want R to help out with that.
So, in essence, I have an archive of mice as follows. I grouped them by the temperature requirements for the genotyping process and by the number of times I need to genotype the DNA samples.
Mouse ID Genotype Gender Age Litter_ID PCR_Temp Rxns
ZDP658 zDC.Cre F 4.9 B23844-1 Z 1
ZDP659 zDC.Cre F 4.9 B23844-1 Z 1
ZDP631 Villin.Cre F 4.9 B23745-2 Y 1
ZDP575 K14.CreER M 5.3 B23744-2 Z 1
ZDO931 K14.CreER M 8.6 B23744-1 Z 1
ZDO932 K14.CreER M 8.6 B23744-1 Z 1
ZDO933 K14.CreER M 8.6 B23744-1 Z 1
ZDQ31 Rosa.TSLP M 3.4 B23701-2 Z 2
ZDQ32 Rosa.TSLP M 3.4 B23701-2 Z 2
My goal is to receive an output of the individual Mouse_ID's in an 8x6 grid grouped by their "PCR_Temps" and multiplied by their "Rxns" in a zigzag order if possible with two extra spaces per genotype group.
My vision of the output is as follows. I would want to input the Litter_ID of the litters that need genotyping and receive the following.
The honeycomb structure is not necessary. A simple rectangular grid works perfectly fine. The same goes for the zigzag format. Both of those aspects of the output format would be nice but aren't a requirement.
Every group of genotypes would need one space for positive control samples and one space for Wild type/Neg control samples. The genotypes that have a value of "2" or more would be repeated as many times as their "Rxns" value states.
I'm sorry if this question is too dense to follow or code for. I have so far been working with dplyr
and ggplot
to manipulate and visualize my mouse archives but this particular problem has me at a loss.
If anyone could even point me in the direction of a package that could get me started I would really appreciate it.
So far I have tried some combinations of dplyr
and purrr
with no success. I have thought of ways to use for loops but have come up empty.
Thank you in advance for any advice.
解决方案
这是一种借助dplyr
,grid
和的方法gridExtra
。
请原谅我的变量命名约定如此混乱。
你的数据不够复杂,无法构建一个好的系统,所以我生成了一些随机数据。在最后找到那个。
首先,让我们定义我们的垃圾并过滤鼠标数据。
library(dplyr)
library(grid)
library(gridExtra)
geno.litters <- c("B23701-2", "B23744-1", "B23844-1","B23944-1")
mice <- data %>%
filter(Litter_ID %in% geno.litters) %>%
arrange(Litter_ID,MouseID) %>%
split(.,.$PCR_Temp)
mice
现在是通过 PCR 温度分成板的小鼠的列表。
让我们定义一个自定义函数来为那些需要重复的基因型添加阳性和阴性对照和重复行。我们可以将该函数应用于每个列表元素lapply
。
addControlSlots <- function(x){
genotypes <- unique(x$Genotype)
genotype.dfs <- list()
for ( i in seq_along(genotypes)){
litter.mice <- x[x$Genotype == genotypes[i],]
litter <- litter.mice[1,"Litter_ID"]
Temp <- litter.mice[1,"PCR_Temp"]
litter.mice <- rbind(litter.mice,litter.mice[litter.mice$Rxns == 2,])
litter.mice <- litter.mice[order(litter.mice$MouseID),]
control.rows <- data.frame(Litter_ID = litter, MouseID = c("PosCont","NegCont"),Gender = NA,Genotype = genotypes[i], PCR_Temp = Temp, Rxns = 1)
genotype.dfs[[i]] <- rbind(litter.mice,control.rows)
}
do.call(rbind,genotype.dfs)
}
processed.temps <- lapply(mice,addControlSlots)
processed.temps[[2]]
#$Z
# Litter_ID MouseID Gender Genotype PCR_Temp Rxns
#1 B23701-2 ZO960 F zDC.Cre Z 1
#2 B23701-2 ZP810 F zDC.Cre Z 1
#3 B23701-2 ZP992 M zDC.Cre Z 1
#4 B23701-2 PosCont <NA> zDC.Cre Z 1
#5 B23701-2 NegCont <NA> zDC.Cre Z 1
#...15 more rows
我们现在对每个基因型都有控制。
现在让我们定义一个函数来填充 PCR 板。并再次将其应用于列表。
makePCRPlate <- function(x){
mouse.vector <- as.character(x$MouseID)
plate.vector <- rep(NA,6*8)
plate.vector[1:length(mouse.vector)] <- mouse.vector
wide <- matrix(plate.vector,nrow=2,byrow = FALSE)
rbind(wide[,1:8],wide[,9:16],wide[,17:24])
}
pcr.plates <- lapply(processed.temps,makePCRPlate)
pcr.plates[[2]]
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#[1,] "ZO960" "ZP992" "NegCont" "ZO214" "ZP333" "ZP455" "ZP478" "ZQ130"
#[2,] "ZP810" "PosCont" "ZO214" "ZP333" "ZP455" "ZP478" "ZQ130" "ZQ875"
#[3,] "ZQ875" "NegCont" NA NA NA NA NA NA
#[4,] "PosCont" NA NA NA NA NA NA NA
#[5,] NA NA NA NA NA NA NA NA
#[6,] NA NA NA NA NA NA NA NA
我们可以看到样本已经以锯齿形图案填充。
现在让我们用布局grid
制作一个文件。.pdf
pdf("MyPCRPlates.pdf")
for(i in seq_along(pcr.plates)){
grid.newpage()
grid.table(pcr.plates[[i]])
grid.text(paste0("PCR Temp ",names(pcr.plates)[i]),y = unit(0.9,"npc"))
}
dev.off()
该.pdf
文件应该有每个温度的页面。
数据
set.seed(1)
data1 <- data.frame("MouseID" = paste0("Z",sample(c("O","P","Q"),size = 50,replace = TRUE),round(runif(50,1,999))),
Litter_ID = sample(c("B23701-2", "B23744-1", "B23744-2", "B23745-2", "B23844-1","B23944-1", "B23944-2", "B23951-1"),size=50, replace = TRUE),
Gender = sample(c("F","M"), size = 50, replace = TRUE))
data2 <- data.frame(Genotype = c("zDC.Cre","Villin.Cre","Villin.Cre","zDC.Cre","K14.CreER","Rosa.TSLP","Rosa.TSLP","K14.CreER"),
Litter_ID = c("B23701-2", "B23744-1", "B23744-2", "B23745-2", "B23844-1","B23944-1", "B23944-2", "B23951-1"),
PCR_Temp = c("Z","Y","Y","Z","Y","Z","Z","Y"),
Rxns = c(1,1,1,1,1,2,2,1))
data <- merge(data1,data2)
data
# Litter_ID MouseID Gender Genotype PCR_Temp Rxns
#1 B23701-2 ZP810 F zDC.Cre Z 1
#2 B23701-2 ZP992 M zDC.Cre Z 1
#3 B23701-2 ZO960 F zDC.Cre Z 1
#4 B23744-1 ZO122 F Villin.Cre Y 1
#5 B23744-1 ZQ259 F Villin.Cre Y 1
#... 45 more rows
推荐阅读
- git - git:并行查看特定提交的源代码
- regex - 使用正则表达式在 kibana 弹性搜索之间查找字符串,如 splunk
- javascript - 对多个元素重复相同的操作
- r - 在数据框的元素中转换列表的元素
- python - Pandas / Python 根据特定行值删除重复项
- go - 孙外键
- java - 基于单击 JButtons 在 JSlider 上反映值
- openacc - OpenACC:外部变量和声明指令
- performance - Tibco Spotfire:自动化服务与计划更新 - 它们之间有什么区别?
- url-rewriting - web.config 重定向规则在 IE 和 FireFox 中不起作用