首页 > 解决方案 > How do i use R to input a number of grouping values and recieve an output of observations in a 8x6 grid?

问题描述

I'm a newbie to the world of coding/programming. Loving the challenge so far but I've hit a bit of a roadblock.

I'm attempting to automate genotyping preparation for my job as a lab technician.


Unnecessary background:

I take care of a colony of 500-600 mice with 40-50 given genotype constructs at any given time. Whenever I get new litters (depending on the genotype of the parents) I have to extract DNA and confirm the genotype of the offspring. The task and the challenge of trouble-shooting the process were fun at first but it's getting very mundane and repetitive now. So I've started using R to automate certain parts of my job.

TL/DR: My job is getting repetitive, I want R to help out with that.


So, in essence, I have an archive of mice as follows. I grouped them by the temperature requirements for the genotyping process and by the number of times I need to genotype the DNA samples.

Mouse ID  Genotype   Gender  Age       Litter_ID   PCR_Temp  Rxns
ZDP658    zDC.Cre       F    4.9        B23844-1      Z        1
ZDP659    zDC.Cre       F    4.9        B23844-1      Z        1
ZDP631    Villin.Cre    F    4.9        B23745-2      Y        1
ZDP575    K14.CreER     M    5.3        B23744-2      Z        1
ZDO931    K14.CreER     M    8.6        B23744-1      Z        1
ZDO932    K14.CreER     M    8.6        B23744-1      Z        1
ZDO933    K14.CreER     M    8.6        B23744-1      Z        1
ZDQ31     Rosa.TSLP     M    3.4        B23701-2      Z        2
ZDQ32     Rosa.TSLP     M    3.4        B23701-2      Z        2

My goal is to receive an output of the individual Mouse_ID's in an 8x6 grid grouped by their "PCR_Temps" and multiplied by their "Rxns" in a zigzag order if possible with two extra spaces per genotype group.

My vision of the output is as follows. I would want to input the Litter_ID of the litters that need genotyping and receive the following.

PCR Prep

The honeycomb structure is not necessary. A simple rectangular grid works perfectly fine. The same goes for the zigzag format. Both of those aspects of the output format would be nice but aren't a requirement.

Every group of genotypes would need one space for positive control samples and one space for Wild type/Neg control samples. The genotypes that have a value of "2" or more would be repeated as many times as their "Rxns" value states.

I'm sorry if this question is too dense to follow or code for. I have so far been working with dplyr and ggplot to manipulate and visualize my mouse archives but this particular problem has me at a loss.

If anyone could even point me in the direction of a package that could get me started I would really appreciate it.

So far I have tried some combinations of dplyr and purrr with no success. I have thought of ways to use for loops but have come up empty.

Thank you in advance for any advice.

标签: rinputoutput

解决方案


这是一种借助dplyr,grid和的方法gridExtra

请原谅我的变量命名约定如此混乱。

你的数据不够复杂,无法构建一个好的系统,所以我生成了一些随机数据。在最后找到那个。

首先,让我们定义我们的垃圾并过滤鼠标数据。

library(dplyr)
library(grid)
library(gridExtra)

geno.litters <- c("B23701-2", "B23744-1", "B23844-1","B23944-1")
mice <- data %>% 
            filter(Litter_ID %in% geno.litters) %>%
            arrange(Litter_ID,MouseID) %>%
            split(.,.$PCR_Temp)

mice现在是通过 PCR 温度分成板的小鼠的列表。

让我们定义一个自定义函数来为那些需要重复的基因型添加阳性和阴性对照和重复行。我们可以将该函数应用于每个列表元素lapply

addControlSlots <- function(x){
  genotypes <- unique(x$Genotype)
  genotype.dfs <- list()
  for ( i in seq_along(genotypes)){
    litter.mice <- x[x$Genotype == genotypes[i],]
    litter <- litter.mice[1,"Litter_ID"]
    Temp <- litter.mice[1,"PCR_Temp"]
    litter.mice <- rbind(litter.mice,litter.mice[litter.mice$Rxns == 2,])
    litter.mice <- litter.mice[order(litter.mice$MouseID),]
    control.rows <- data.frame(Litter_ID = litter, MouseID = c("PosCont","NegCont"),Gender = NA,Genotype = genotypes[i], PCR_Temp = Temp, Rxns = 1)
    genotype.dfs[[i]] <- rbind(litter.mice,control.rows)
  }
do.call(rbind,genotype.dfs)
}

processed.temps <- lapply(mice,addControlSlots)
processed.temps[[2]]
#$Z
#    Litter_ID MouseID Gender  Genotype PCR_Temp Rxns
#1    B23701-2   ZO960      F   zDC.Cre        Z    1
#2    B23701-2   ZP810      F   zDC.Cre        Z    1
#3    B23701-2   ZP992      M   zDC.Cre        Z    1
#4    B23701-2 PosCont   <NA>   zDC.Cre        Z    1
#5    B23701-2 NegCont   <NA>   zDC.Cre        Z    1
#...15 more rows

我们现在对每个基因型都有控制。

现在让我们定义一个函数来填充 PCR 板。并再次将其应用于列表。

makePCRPlate <- function(x){
  mouse.vector <- as.character(x$MouseID)
  plate.vector <- rep(NA,6*8)
  plate.vector[1:length(mouse.vector)] <- mouse.vector
  wide <- matrix(plate.vector,nrow=2,byrow = FALSE)
  rbind(wide[,1:8],wide[,9:16],wide[,17:24])
} 

pcr.plates <- lapply(processed.temps,makePCRPlate)
pcr.plates[[2]]
#     [,1]      [,2]      [,3]      [,4]    [,5]    [,6]    [,7]    [,8]   
#[1,] "ZO960"   "ZP992"   "NegCont" "ZO214" "ZP333" "ZP455" "ZP478" "ZQ130"
#[2,] "ZP810"   "PosCont" "ZO214"   "ZP333" "ZP455" "ZP478" "ZQ130" "ZQ875"
#[3,] "ZQ875"   "NegCont" NA        NA      NA      NA      NA      NA     
#[4,] "PosCont" NA        NA        NA      NA      NA      NA      NA     
#[5,] NA        NA        NA        NA      NA      NA      NA      NA     
#[6,] NA        NA        NA        NA      NA      NA      NA      NA    

我们可以看到样本已经以锯齿形图案填充。

现在让我们用布局grid制作一个文件。.pdf

pdf("MyPCRPlates.pdf")
for(i in seq_along(pcr.plates)){
  grid.newpage()
  grid.table(pcr.plates[[i]])
  grid.text(paste0("PCR Temp ",names(pcr.plates)[i]),y = unit(0.9,"npc"))
}
dev.off()  

在此处输入图像描述

.pdf文件应该有每个温度的页面。

数据

set.seed(1)
data1 <- data.frame("MouseID" = paste0("Z",sample(c("O","P","Q"),size = 50,replace = TRUE),round(runif(50,1,999))),
Litter_ID = sample(c("B23701-2", "B23744-1", "B23744-2", "B23745-2", "B23844-1","B23944-1", "B23944-2", "B23951-1"),size=50, replace = TRUE),
Gender = sample(c("F","M"), size = 50, replace = TRUE))

data2 <- data.frame(Genotype = c("zDC.Cre","Villin.Cre","Villin.Cre","zDC.Cre","K14.CreER","Rosa.TSLP","Rosa.TSLP","K14.CreER"), 
                    Litter_ID = c("B23701-2", "B23744-1", "B23744-2", "B23745-2", "B23844-1","B23944-1", "B23944-2", "B23951-1"),
                    PCR_Temp = c("Z","Y","Y","Z","Y","Z","Z","Y"),
                    Rxns = c(1,1,1,1,1,2,2,1))
data <- merge(data1,data2)
data
#   Litter_ID MouseID Gender   Genotype PCR_Temp Rxns
#1   B23701-2   ZP810      F    zDC.Cre        Z    1
#2   B23701-2   ZP992      M    zDC.Cre        Z    1
#3   B23701-2   ZO960      F    zDC.Cre        Z    1
#4   B23744-1   ZO122      F Villin.Cre        Y    1
#5   B23744-1   ZQ259      F Villin.Cre        Y    1
#... 45 more rows

推荐阅读