r - 如何将值转换为用冒号分隔的含义(双点)
问题描述
我有这样的数据
df<- structure(list(df = structure(c(10L, 8L, 2L, 8L, 7L, 7L, 10L,
8L, 3L, 10L, 10L, 9L, 9L, 1L, 1L, 3L, 1L, 5L, 5L, 4L, 10L, 8L,
1L, 1L, 2L, 6L), .Label = c("-1:-1:2", "-1:2:-1", "-1:2:2", "1:01:01",
"1:1(2):1", "1(1)|1(2):1(1)|1(2):1(1)|1(2)", "1(1)|1(2):2:2",
"2:-1:-1", "2:-1:2", "2:02:02"), class = "factor")), class = "data.frame", row.names = c(NA,
-26L))
我想将其扩展为我定义的单词。我希望有与双点数一样多的列,例如这里我们有三个:所以我们将在 df 之后添加 3 列。然后我们用文字填满它
2 = Homo
-1 = No
1= Het
1(1)= Het1
1(2)= Het2
所以预期的输出如下所示。
2:02:02 Homo Homo Homo
2:-1:-1 Homo No No
-1:2:-1 No Homo No
2:-1:-1 Homo No No
1(1)|1(2):2:2 Het1 Het2 Homo Homo
1(1)|1(2):2:2 Het1 Het2 Homo Homo
2:02:02 Homo Homo Homo
2:-1:-1 Homo No No
-1:2:2 No Homo Homo
2:02:02 Homo Homo Homo
2:02:02 Homo Homo Homo
2:-1:2 Homo No Homo
2:-1:2 Homo No Homo
-1:-1:2 No No Homo
-1:-1:2 No No Homo
-1:2:2 No Homo Homo
-1:-1:2 No No Homo
1:1(2):1 Het Het2 Het
1:1(2):1 Het Het3 Het
1:01:01 Het Het Het
2:02:02 Homo Homo Homo
2:-1:-1 Homo No No
-1:-1:2 No No Homo
-1:-1:2 No No Homo
-1:2:-1 No Homo No
1(1)|1(2):1(1)|1(2):1(1)|1(2) Het1 Het2 Het1 Het2 Het1 Het2
解决方案
不确定结果是否正是您所需要的,但这可能会有所帮助。我认为也许这不是最有效和最漂亮的解决方案,但它可以作为一个起点。
但是,我调用dats
了您的数据:
head(dats)
df
1 2:02:02
2 2:-1:-1
3 -1:2:-1
4 2:-1:-1
5 1(1)|1(2):2:2
6 1(1)|1(2):2:2
我创建了一个映射data.frame
:
mapping
id value
1 2 Homo
2 -1 No
3 1 Het
4 1(1) Het1
5 1(2) Het2
首先,我用stringr::str_split_fixed()
双点分开:
library(stringr)
double_point <- as.data.frame.matrix(str_split_fixed(dats$df, ":", 3))
现在我们必须将每一列的值分开|
:
listed <- list() # empty list
for (i in (1:ncol(double_point))){
listed[[i]] <- (double_point[,i])
listed[[i]] <- str_split_fixed(listed[[i]], "\\|", 2)
}
# put as data frame
df_ <- do.call(cbind, listed)
# this is going to help in the future
df_1 <- df_
# result till now:
head(df_1)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "2" "" "02" "" "02" ""
[2,] "2" "" "-1" "" "-1" ""
[3,] "-1" "" "2" "" "-1" ""
[4,] "2" "" "-1" "" "-1" ""
[5,] "1(1)" "1(2)" "2" "" "2" ""
[6,] "1(1)" "1(2)" "2" "" "2" ""
现在我们必须用映射替换值,并将它们与拆分的原始数据绑定(在这种情况下):
listed <- list()
for (i in (1:ncol(df_))){
df_[,i] <- gsub("0","",df_[,i])
listed[[i]] <- mapping[match(df_[,i], mapping$id), 2, drop=F]
}
df_final <- cbind(df_1,do.call(cbind, listed))
head(df_final)
1 2 3 4 5 6 value value value value value value
1 2 02 02 Homo <NA> Homo <NA> Homo <NA>
1.1 2 -1 -1 Homo <NA> No <NA> No <NA>
2 -1 2 -1 No <NA> Homo <NA> No <NA>
1.2 2 -1 -1 Homo <NA> No <NA> No <NA>
4 1(1) 1(2) 2 2 Het1 Het2 Homo <NA> Homo <NA>
4.1 1(1) 1(2) 2 2 Het1 Het2 Homo <NA> Homo <NA>
希望能帮助到你!
编辑
这里的映射dput()
和str()
:
dput(mapping)
structure(list(id = structure(c(5L, 1L, 2L, 3L, 4L), .Label = c("-1",
"1", "1(1)", "1(2)", "2"), class = "factor"), value = structure(c(4L,
5L, 1L, 2L, 3L), .Label = c("Het", "Het1", "Het2", "Homo", "No"
), class = "factor")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5"))
str(mapping)
'data.frame': 5 obs. of 2 variables:
$ id : Factor w/ 5 levels "-1","1","1(1)",..: 5 1 2 3 4
$ value: Factor w/ 5 levels "Het","Het1","Het2",..: 4 5 1 2 3
推荐阅读
- google-cloud-platform - Terraform - BigQuery API 的配额限制
- logstash - 用于 Cisco Call Manager 日志的 Logstash Grok
- sql-server - 如何使用特定列将数据从 CSV 文件插入到 SQL Server?
- php - 为什么sql无限加载数据?
- google-chrome-devtools - 如何选择 CSS 作为对象
- reflection - 哪些是在 SCNProgram 传递的金属着色器中使用的正确矩阵值,以获得正确的镀铬,如反射
- c# - Json.Net 反序列化 $type 信息到 object.property
- jenkins - CI管道中源代码的linting
- android - Jet pack 导航组件以编程方式获取深层链接
- laravel - 移除2个模型的变形关系