首页 > 解决方案 > 如何将值转换为用冒号分隔的含义(双点)

问题描述

我有这样的数据

df<- structure(list(df = structure(c(10L, 8L, 2L, 8L, 7L, 7L, 10L, 
8L, 3L, 10L, 10L, 9L, 9L, 1L, 1L, 3L, 1L, 5L, 5L, 4L, 10L, 8L, 
1L, 1L, 2L, 6L), .Label = c("-1:-1:2", "-1:2:-1", "-1:2:2", "1:01:01", 
"1:1(2):1", "1(1)|1(2):1(1)|1(2):1(1)|1(2)", "1(1)|1(2):2:2", 
"2:-1:-1", "2:-1:2", "2:02:02"), class = "factor")), class = "data.frame", row.names = c(NA, 
-26L))

我想将其扩展为我定义的单词。我希望有与双点数一样多的列,例如这里我们有三个:所以我们将在 df 之后添加 3 列。然后我们用文字填满它

2 = Homo
-1 = No
1= Het
1(1)= Het1 
1(2)= Het2  

所以预期的输出如下所示。

2:02:02 Homo    Homo    Homo
2:-1:-1 Homo    No  No
-1:2:-1 No  Homo    No
2:-1:-1 Homo    No  No
1(1)|1(2):2:2   Het1 Het2   Homo    Homo
1(1)|1(2):2:2   Het1 Het2   Homo    Homo
2:02:02 Homo    Homo    Homo
2:-1:-1 Homo    No  No
-1:2:2  No  Homo    Homo
2:02:02 Homo    Homo    Homo
2:02:02 Homo    Homo    Homo
2:-1:2  Homo    No  Homo
2:-1:2  Homo    No  Homo
-1:-1:2 No  No  Homo
-1:-1:2 No  No  Homo
-1:2:2  No  Homo    Homo
-1:-1:2 No  No  Homo
1:1(2):1    Het Het2    Het
1:1(2):1    Het Het3    Het
1:01:01 Het Het Het
2:02:02 Homo    Homo    Homo
2:-1:-1 Homo    No  No
-1:-1:2 No  No  Homo
-1:-1:2 No  No  Homo
-1:2:-1 No  Homo    No
1(1)|1(2):1(1)|1(2):1(1)|1(2)   Het1 Het2   Het1 Het2   Het1 Het2 

标签: r

解决方案


不确定结果是否正是您所需要的,但这可能会有所帮助。我认为也许这不是最有效和最漂亮的解决方案,但它可以作为一个起点。

但是,我调用dats了您的数据:

head(dats)
                              df
1                        2:02:02
2                        2:-1:-1
3                        -1:2:-1
4                        2:-1:-1
5                  1(1)|1(2):2:2
6                  1(1)|1(2):2:2

我创建了一个映射data.frame

mapping
    id value
1    2  Homo
2   -1    No
3    1   Het
4 1(1)  Het1
5 1(2)  Het2

首先,我用stringr::str_split_fixed()双点分开:

library(stringr)
double_point <- as.data.frame.matrix(str_split_fixed(dats$df, ":", 3))

现在我们必须将每一列的值分开|

listed <- list() # empty list 
for (i in (1:ncol(double_point))){
  listed[[i]] <- (double_point[,i])
  listed[[i]] <- str_split_fixed(listed[[i]], "\\|", 2)
}

# put as data frame
df_ <- do.call(cbind, listed)

# this is going to help in the future
df_1 <- df_

# result till now:
head(df_1)
     [,1]   [,2]   [,3] [,4] [,5] [,6]
[1,] "2"    ""     "02" ""   "02" ""  
[2,] "2"    ""     "-1" ""   "-1" ""  
[3,] "-1"   ""     "2"  ""   "-1" ""  
[4,] "2"    ""     "-1" ""   "-1" ""  
[5,] "1(1)" "1(2)" "2"  ""   "2"  ""  
[6,] "1(1)" "1(2)" "2"  ""   "2"  ""

现在我们必须用映射替换值,并将它们与拆分的原始数据绑定(在这种情况下):

listed <- list()

for (i in (1:ncol(df_))){
  df_[,i] <-  gsub("0","",df_[,i])
  listed[[i]] <- mapping[match(df_[,i], mapping$id), 2, drop=F]
}

df_final <- cbind(df_1,do.call(cbind, listed))
head(df_final)
       1    2  3 4  5 6 value value value value value value
1      2      02   02    Homo  <NA>  Homo  <NA>  Homo  <NA>
1.1    2      -1   -1    Homo  <NA>    No  <NA>    No  <NA>
2     -1       2   -1      No  <NA>  Homo  <NA>    No  <NA>
1.2    2      -1   -1    Homo  <NA>    No  <NA>    No  <NA>
4   1(1) 1(2)  2    2    Het1  Het2  Homo  <NA>  Homo  <NA>
4.1 1(1) 1(2)  2    2    Het1  Het2  Homo  <NA>  Homo  <NA>

希望能帮助到你!

编辑

这里的映射dput()str()

dput(mapping)
structure(list(id = structure(c(5L, 1L, 2L, 3L, 4L), .Label = c("-1", 
"1", "1(1)", "1(2)", "2"), class = "factor"), value = structure(c(4L, 
5L, 1L, 2L, 3L), .Label = c("Het", "Het1", "Het2", "Homo", "No"
), class = "factor")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5"))

str(mapping)
'data.frame':   5 obs. of  2 variables:
 $ id   : Factor w/ 5 levels "-1","1","1(1)",..: 5 1 2 3 4
 $ value: Factor w/ 5 levels "Het","Het1","Het2",..: 4 5 1 2 3

推荐阅读