首页 > 解决方案 > 如何在 R 中执行循环并导出数据

问题描述

我有一个问题,我们已经生成了每个条件一个样本的先导基因表达数据,这只是一个测试运行。我有一个对照(基线)样本,后跟 5 个不同的样本,并使用edgeRR 中的包进行了分析,我想指出我的对照样本(作为基线)并计算所有样本的logFClogCPMPValue并从对象et$table。例如,Control vs Sample_1、Control vs Sample_2 直到 Control vs Sample_5 > 导出 csv 文件。如何执行循环以导出所有比较的数据?我们正在寻求分析数百个样本和多条件,稍后在大型数据集上执行此操作会更容易。

谢谢,

图菲克

输入数据

dput(Counts_Test)
structure(list(Control = c(0L, 184L, 60L, 0L, 7L, 0L, 87L, 0L, 
0L, 21L, 193L, 29L, 0L, 0L, 3L, 50L, 0L, 325L, 442L), Sample_1 = c(0, 
140.5, 64, 0, 4, 0, 83, 0, 1, 51.5, 199, 25, 0, 0, 5, 62, 0, 
525, 407), Sample_2 = c(0, 169, 45, 1, 3, 0, 122, 0, 0, 36.5, 
179, 20, 0, 0, 1, 58, 0, 494, 570), Sample_3 = c(0L, 107L, 67L, 
0L, 5L, 0L, 99L, 0L, 0L, 63L, 178L, 34L, 0L, 0L, 2L, 60L, 0L, 
467L, 283L), Sample_4 = c(0L, 221L, 44L, 0L, 1L, 0L, 139L, 0L, 
0L, 48L, 222L, 24L, 1L, 0L, 5L, 67L, 0L, 612L, 451L), Sample_5 = c(0, 
120.5, 45, 1, 1, 0, 100, 0, 0, 44.5, 202, 39, 1, 0, 3, 76, 0, 
719, 681)), class = "data.frame", row.names = c("Gene1", "Gene2", 
"Gene3", "Gene4", "Gene5", "Gene6", "Gene7", "Gene8", "Gene9", 
"Gene10", "Gene11", "Gene12", "Gene13", "Gene14", "Gene15", "Gene16", 
"Gene17", "Gene18", "Gene19"))


dput(Sample_Grouping)
structure(list(SampleID = c("xx-xx-1551", "xx-xx-1548", "xx-xx-1549", 
"xx-xx-1550", "xx-xx-1552", "xx-xx-0093"), ID = c("Control", 
"Sample_1", "Sample_2", "Sample_3", "Sample_4", "Sample_5")), class = "data.frame", row.names = c("Control", 
"Sample_1", "Sample_2", "Sample_3", "Sample_4", "Sample_5"))

下面给出的 R 代码示例以及输入和预期数据:

library(edgeR)
bcv <- 0.4
y <- DGEList(counts=Counts_Test, group=Sample_Grouping$ID)
et <- exactTest(y, dispersion=bcv^2)
View(et$table)
write.csv(et$table, file="./table_Control_vs_Sample_1", sep = ",")

在 Control 与 Sample_1 之间进行比较时为 et$table 导出 csv 文件

dput(et$table)
structure(list(logFC = c(2.56274120305193e-15, -0.550254150196693, 
-0.0683012466357368, 2.56274120305193e-15, -0.946184736817423, 
2.56274120305193e-15, -0.229115434595494, 2.56274120305193e-15, 
3.1003487058189, 1.12824320628868, -0.117305916091926, -0.373934941685134, 
2.56274120305193e-15, 2.56274120305193e-15, 0.557345502442179, 
0.148458971210359, 2.56274120305193e-15, 0.530168392608693, -0.280483136036388
), logCPM = c(10.2398977391586, 16.5699363191385, 15.0947320625249, 
10.459922538352, 11.7193421876963, 10.2398977391586, 15.9893772819593, 
10.2398977391586, 10.3561232675437, 14.7852593647762, 16.8836544336529, 
14.184917355539, 10.4590444918373, 10.2398977391586, 11.6104987130447, 
15.2490804549745, 10.2398977391586, 18.2653428429955, 18.1145148472016
), PValue = c(1, 0.523490569569589, 0.978749897705603, 1, 0.6666864407408, 
1, 0.797857807297049, 1, 1, 0.227758918677035, 0.896097547311082, 
0.732358557879292, 1, 1, 0.788137722865551, 0.88329454414985, 
1, 0.532994222767919, 0.743046444999486)), class = "data.frame", row.names = c("Gene1", 
"Gene2", "Gene3", "Gene4", "Gene5", "Gene6", "Gene7", "Gene8", 
"Gene9", "Gene10", "Gene11", "Gene12", "Gene13", "Gene14", "Gene15", 
"Gene16", "Gene17", "Gene18", "Gene19"))

预期产出

同样,最好为 et$table 导出 csv 文件以比较 Control vs Sample_1、Control vs Sample_2、Control vs Sample_3、Control vs Sample_4、Control vs Sample_5,并将样本名称的后缀添加到输出 csv 文件。

标签: rloopsrecursionapplyexport-to-csv

解决方案


这是你想要的吗?

sample_names <- c("Sample_1", "Sample_2", "Sample_3", "Sample_4", "Sample_5")
for(cur_name in sample_names){
  ...
  write.csv(et$table, file=paste0("./table_Control_vs_",cur_name))
}

...指示您为每个比较运行 edgeR 的行。

对于每次迭代,您可以使用pair函数中的参数exactTest来指定比较。因此,您的完整代码可能如下所示:

sample_names <- c("Sample_1", "Sample_2", "Sample_3", "Sample_4", "Sample_5")
library(edgeR)
bcv <- 0.4
y <- DGEList(counts=Counts_Test, group=Sample_Grouping$ID)
for(cur_name in sample_names){
  et <- exactTest(y, pair=c("Control", cur_name), dispersion=bcv^2)
  write.csv(et$table, file=paste0("./table_Control_vs_",cur_name))
}

请注意,pair参数中的第一个元素被用作比较的基线exactTest

如果这不能解决您的问题,您可以回复,如果可以,请选择答案。

在下面回答您的问题的进一步补充,您可以通过这种方式强制执行基因顺序:

sample_names <- c("Sample_1", "Sample_2", "Sample_3", "Sample_4", "Sample_5")
library(edgeR)
bcv <- 0.4
y <- DGEList(counts=Counts_Test, group=Sample_Grouping$ID)
for(cur_name in sample_names){
  et <- exactTest(y, pair=c("Control", cur_name), dispersion=bcv^2)
  if(cur_name=="Sample_1"){
    # In the first iteration, capture the order
    geneOrder <- row.names(et$table)
  }else{
    # In the subsequent iterations, enforce the order
    et$table <- et$table[geneOrder,]
  }
  # Now, you can write
  write.csv(et$table, file=paste0("./table_Control_vs_",cur_name))
}
# The if/else statement will ensure the order is the same for all

推荐阅读