首页 > 解决方案 > 如何操作两根柱子?

问题描述

我正在处理一些遗传数据,而我的专栏之一不是我想要的格式。我不知道这里讨论了多少生物学,但我正在尝试修复我的氨基酸在我的数据中的显示方式。

氨基酸显然有一个名称,但它们也有一个 3 个字母的缩写和一个 1 个字母的缩写。我的数据包含 3 个字母形式的氨基酸,但我想将它们更改为 1 个字母的缩写。这是我的数据示例。

 chr location           effect   impact AA_change
   1    12543 missense_variant MODERATE  p.Ala12Val
   1    52367 missense_variant MODERATE  p.Leu54Pro
   1   752347 missense_variant MODERATE  p.Met99Ser
   1   984645 missense_variant MODERATE  p.Lys34Ile
   1   989845 missense_variant MODERATE  p.Arg4Cys
   1   999854 missense_variant MODERATE  p.His43Gly
   1   999855 missense_variant MODERATE  p.Glu14Phe

dat <- structure(list(chr = c(1L, 1L, 1L, 1L, 1L, 1L, 1L), location = c(12543L, 
52367L, 752347L, 984645L, 989845L, 999854L, 999855L), effect = c("missense_variant", 
"missense_variant", "missense_variant", "missense_variant", "missense_variant", 
"missense_variant", "missense_variant"), impact = c("MODERATE", 
"MODERATE", "MODERATE", "MODERATE", "MODERATE", "MODERATE", "MODERATE"
), AA_change = c("Ala12Val", "Leu54Pro", "Met99Ser", "Lys34Ile", 
"Arg4Cys", "His43Gly", "Glu14Phe")), .Names = c("chr", "location", 
"effect", "impact", "AA_change"), row.names = c(NA, -7L), class = "data.frame")

这是 3 个字母的氨基酸列表,以及它们更好的缩写是什么。

  Ala == A
  Arg == R
  Asn == N
  Asp == D
  Cys == C
  Glu == E
  Gln == Q
  Gly == G
  His == H
  Ile == I
  Leu == L
  Lys == K
  Met == M
  Phe == F
  Pro == P
  Ser == S
  Thr == T
  Trp == W
  Tyr == Y
  Val == V

我觉得有一个简单的功能可以做到这一点,但我正在努力解决如何做到这一点。我习惯于只更改一列的一部分,而不是一次更改两件事。所以我要问的是我该如何改变这个

Ala12Val
Leu54Pro
Met99Ser
Lys34Ile
Arg4Cys
His43Gly
Glu14Phe

对此

A12V
L54P
M99S
K32I
R4C
E14F

这是可以做到的吗?

标签: rregexbioinformaticsgenetics

解决方案


b=which(adist(dat2$V1,dat$AA_change,partial = T)==0,T)

dat$AA_change1=`regmatches<-`(dat$AA_change,gregexpr("\\D+",dat$AA_change),
                                 value=split(dat2$V3[b[,1]],b[,2]))

dat
  chr location           effect   impact AA_change AA_change1
1   1    12543 missense_variant MODERATE  Ala12Val       A12V
2   1    52367 missense_variant MODERATE  Leu54Pro       L54P
3   1   752347 missense_variant MODERATE  Met99Ser       M99S
4   1   984645 missense_variant MODERATE  Lys34Ile       I34K
5   1   989845 missense_variant MODERATE   Arg4Cys        R4C
6   1   999854 missense_variant MODERATE  His43Gly       G43H
7   1   999855 missense_variant MODERATE  Glu14Phe       E14F



dat2 = read.table(text="Ala == A
  Arg == R
  Asn == N
  Asp == D
  Cys == C
  Glu == E
  Gln == Q
  Gly == G
  His == H
  Ile == I
  Leu == L
  Lys == K
  Met == M
  Phe == F
  Pro == P
  Ser == S
  Thr == T
  Trp == W
  Tyr == Y
  Val == V")[-2]

推荐阅读