r - 基于列合并多个文件并打印第n列R
问题描述
我有 3 个文件。我需要获取第一个文件,并且对于每一行,需要匹配文件 2 中的第一列。然后从 file2 中获取相应的别名并将其与 file3(描述或别名列)匹配,然后打印 OMIM Id。
File1:
**Symbol**
MCL1
ABCB1
BAX
IKZF1
WWOX
BCL2L1
BCL2L11
CCND1
TNFSF10
File2:
**Symbol2 Aliases**
MCL1 MCL1, BCL2 family apoptosis regulator
ABCB1 ATP binding cassette subfamily B member 1
WWOX WW domain containing oxidoreductase
BCL2L1 RB transcriptional corepressor 1
BOK peroxisome proliferator activated receptor gamma
RHOA ras homolog family member A
ABCC1 C-X-C motif chemokine ligand 12
PARP1 poly(ADP-ribose) polymerase 1
BAK1 BRCA1, DNA repair associated
file3:
**description OMIM Aliases**
MCL1, BCL2 family apoptosis regulator 159552 G protein subunit alpha 12
ATP binding cassette subfamily B member 1 171050 matrix metallopeptidase 9
BCL2 associated X, apoptosis regulator 600040 cadherin 1
IKAROS family zinc finger 1 603023 Janus kinase 2
WW domain containing oxidoreductase 605131 ataxin 3
BCL2 like 1 600039 RB transcriptional corepressor 1
BCL2 like 11 603827 transferrin receptor
cyclin D1 168461 C-C motif chemokine ligand 2
TNF superfamily member 10 603598 prostaglandin-endoperoxide synthase 2
Expected result:
**Symbol Symbol1 description/Aliases OMIM**
MCL1 MCL1 MCL1, BCL2 family apoptosis regulator 159552
ABCB1 ABCB1 ATP binding cassette subfamily B member 1 171050
BAX
IKZF1
WWOX WWOX WW domain containing oxidoreductase 605131
BCL2L1 BCL2L1 RB transcriptional corepressor 1 600039
BCL2L11
CCND1
TNFSF10
我使用了 merge 和 inner_join 但没有达到预期。有什么帮助吗?
解决方案
另一种可能性是重命名您要合并的相关列,然后使用purrr::reduce
with dplyr::left_join
(或在基础 RReduce
中 with merge
)
names(df2) <- c("Symbol", "Description/Aliases")
names(df3) <- c("Description/Aliases", "OMIM", "Aliases")
purrr::reduce(list(df1, df2, df3), dplyr::left_join) %>% dplyr::select(-Aliases)
# Symbol Description/Aliases OMIM
#1 MCL1 MCL1, BCL2 family apoptosis regulator 159552
#2 ABCB1 ATP binding cassette subfamily B member 1 171050
#3 BAX <NA> NA
#4 IKZF1 <NA> NA
#5 WWOX WW domain containing oxidoreductase 605131
#6 BCL2L1 RB transcriptional corepressor 1 NA
#7 BCL2L11 <NA> NA
#8 CCND1 <NA> NA
#9 TNFSF10 <NA> NA
或者在基础 R
Reduce(function(x, y) merge(x, y, all.x = T), list(df1, df2, df3))
样本数据
df1 <- read.table(text =
"Symbol
MCL1
ABCB1
BAX
IKZF1
WWOX
BCL2L1
BCL2L11
CCND1
TNFSF10", header = T)
df2 <- read.table(text =
"Symbol2 Aliases
MCL1 'MCL1, BCL2 family apoptosis regulator'
ABCB1 'ATP binding cassette subfamily B member 1'
WWOX 'WW domain containing oxidoreductase'
BCL2L1 'RB transcriptional corepressor 1'
BOK 'peroxisome proliferator activated receptor gamma'
RHOA 'ras homolog family member A'
ABCC1 'C-X-C motif chemokine ligand 12'
PARP1 'poly(ADP-ribose) polymerase 1'
BAK1 'BRCA1, DNA repair associated'", header = T)
df3 <- read.table(text =
"description OMIM Aliases
'MCL1, BCL2 family apoptosis regulator' 159552 'G protein subunit alpha 12'
'ATP binding cassette subfamily B member 1' 171050 'matrix metallopeptidase 9'
'BCL2 associated X, apoptosis regulator' 600040 'cadherin 1'
'IKAROS family zinc finger 1' 603023 'Janus kinase 2'
'WW domain containing oxidoreductase' 605131 'ataxin 3'
'BCL2 like 1' 600039 'RB transcriptional corepressor 1'
'BCL2 like 11' 603827 'transferrin receptor'
'cyclin D1' 168461 'C-C motif chemokine ligand 2'
'TNF superfamily member 10' 603598 'prostaglandin-endoperoxide synthase 2'", header = T)
推荐阅读
- javascript - ElectronJS:未捕获的 TypeError:无法读取未定义的属性“BrowserWindow”/“getCurrentWindow”
- reactjs - 为什么使用地图后我的组件没有更新?
- r - 按最新条目分组数据
- python - 获取从驱动器下载的单词文档的共享链接
- javascript - 用 window.addEventListener 隐藏 div 不起作用
- sql - SQL - GROUP BY 两个日期之间的月份
- shell - 使用 UNIX 工具从多行文本块中提取数据以生成一行
- nodatime - 使用 `Period.Between` 和减去两个本地日期之间的区别
- symfony - Symfony 5 Mailer 发送带有个人资料图片的邮件
- php - yii2如何验证用户是代理还是管理员