首页 > 解决方案 > 从其他数据框中将列合并到 SummarizedExperiment

问题描述

我有一个SummarizedExperiment看起来像这样的:

class: RangedSummarizedExperiment dim: 483731 485 metadata(4): creationDate author BBMRIomicsVersion note assays(1): data rownames(483731): cg01707559 cg02004872 ... ch.22.47579720R ch.22.48274842R rowData names(10): addressA addressB ... probeEnd probeTarget colnames(485): 200397860027_R01C01 200397860027_R02C02 ... 200556930046_R03C01 200556930046_R06C02 colData names(946): STUDY_NUMBER SampleID ... Basename ID

我有一个看起来像这样的数据框:

STUDY_NUMBER UPID Testosterone Estradiol SHBG Sex 1 UPID01 NA NA NA male 3 UPID02 NA NA NA male 3 UPID03 10.02 62 49.6 male 4 UPID04 NA NA NA male 5 UPID05 NA NA NA female

我想合并这个表(n 行 = 3662),基于STUDY_NUMBER. 所以我使用了以下代码:

colData(aems450k1.MvaluesQCIMPplaqueSE) <- merge(colData(aems450k1.MvaluesQCIMPplaqueSE), AEDB_Q1_20180223_sex, by.x = "STUDY_NUMBER", by.y = "STUDY_NUMBER", all.x = TRUE)

这导致以下SummarizedExperiment对象:

class: RangedSummarizedExperiment dim: 483731 485 metadata(4): creationDate author BBMRIomicsVersion note assays(1): data rownames(483731): cg01707559 cg02004872 ... ch.22.47579720R ch.22.48274842R rowData names(10): addressA addressB ... probeEnd probeTarget colnames: NULL colData names(952): STUDY_NUMBER SampleID ... Sex T_E2

您会注意到colnames现在为 NULL。因此,我的问题是:

我怎样才能防止这种情况发生?

我的第二个问题:

这可能是因为两个数据帧的顺序(基于STUDY_NUMBER)不一样吗?

非常感谢,

桑德

标签: rdataframebioconductor

解决方案


我相信我找到了答案,另请参阅: https: //support.bioconductor.org/p/114113/#114117

我认为问题在于colData得到的顺序与Assay数据不同,这不应该发生。但是,如果我sort =使用合并命令一切都很好,我可以稍后添加列名。所以:

dim(aems450k1.MvaluesQCIMPplaqueSE) aems450k1.MvaluesQCIMPplaqueSE colData(aems450k1.MvaluesQCIMPplaqueSE) <- merge(colData(aems450k1.MvaluesQCIMPplaqueSE), AEDB_Q1_20180223_sex, by = "STUDY_NUMBER", sort = FALSE) colnames(aems450k1.MvaluesQCIMPplaqueSE) <- aems450k1.MvaluesQCIMPplaqueSE$ID dim(aems450k1.MvaluesQCIMPplaqueSE)

结果是:

class: RangedSummarizedExperiment dim: 483731 485 metadata(4): creationDate author BBMRIomicsVersion note assays(1): data rownames(483731): cg01707559 cg02004872 ... ch.22.47579720R ch.22.48274842R rowData names(10): addressA addressB ... probeEnd probeTarget colnames(485): 8918692001_R01C01 8918692001_R02C01 ... 9221198166_R06C01 9221198166_R06C02 colData names(946): STUDY_NUMBER SampleID ... Basename ID

哪个是正确的顺序colnames。虽然没有sort =,但 colnames 的顺序就像colnames(485): 9221198166_R06C02 9221198166_R06C01 ... 8918692001_R02C01 8918692001_R01C01.

这有意义吗?


推荐阅读