r - R:基于列名的每列不同的应用函数
问题描述
我有一个结构如下的数据框:
ID X1 X2 X3 X4 X5
1 1 grn gerp hrn asn bln
2 2 asn bln hgv mpl zwl
3 3 zwl mpl lwd <NA> <NA>
4 4 bln asn hrn gerp grn
5 5 lwd mpl zwl <NA> <NA>
我目前使用的方法不足以检查一行是否包含以下单词列表中的单词:
wordlist <- c("asn", "bln", "gerp", "grn", "hgv", "hrn", "lwd", "mpl", "zwl")
通过使用下面的方法,如果行 ID 包含行中的单词,我会得到 TRUE 或 FALSE:
newDF <- as.data.frame(DF[,1])
newDF[, wordlist] <- NA
newDF[2] <- apply(DF, 1, function(r) any(r %in% as.character(wordlist[1])))
newDF[3] <- apply(DF, 1, function(r) any(r %in% as.character(wordlist[2])))
newDF[4] <- apply(DF, 1, function(r) any(r %in% as.character(wordlist[3])))
newDF[5] <- apply(DF, 1, function(r) any(r %in% as.character(wordlist[4])))
newDF[6] <- apply(DF, 1, function(r) any(r %in% as.character(wordlist[5])))
newDF[7] <- apply(DF, 1, function(r) any(r %in% as.character(wordlist[6])))
newDF[8] <- apply(DF, 1, function(r) any(r %in% as.character(wordlist[7])))
newDF[9] <- apply(DF, 1, function(r) any(r %in% as.character(wordlist[8])))
newDF[10] <- apply(DF, 1, function(r) any(r %in% as.character(wordlist[9])))
产生以下数据框:
DF[, 1] asn bln gerp grn hgv hrn lwd mpl zwl
1 1 TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE
2 2 FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE TRUE
3 3 FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
4 4 TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE
5 5 FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
如您所见,这种方法效率很低,尤其是当我必须将此方法应用于更大的 DF 和 400 个单词的词表时。
主要问题:(编辑:已解决)
- 有没有一种有效的方法来获得相同的结果?
子问题:
- 是否可以不输出 TRUE 或 FALSE 而是输出 DF 行中单词的索引?
要尝试的数据框:
> dput(DF)
structure(list(ID = 1:5, X1 = structure(c(3L, 1L, 5L, 2L, 4L), .Label = c("asn ", "bln", "grn", "lwd", "zwl"), class = "factor"), X2 = structure(c(3L, 2L, 4L, 1L, 4L), .Label = c("asn", "bln", "gerp", "mpl"), class = "factor"), X3 = structure(c(2L, 1L, 3L, 2L, 4L), .Label = c("hgv", "hrn",
"lwd", "zwl"), class = "factor"), X4 = structure(c(1L, 3L,
NA, 2L, NA), .Label = c("asn", "gerp", "mpl"), class = "factor"), X5 = structure(c(1L, 3L, NA, 2L, NA), .Label = c("bln", "grn",
"zwl"), class = "factor")), class = "data.frame", row.names = c(NA, -5L))
提前致谢!
解决方案
这是使用的基本 R 选项match
t(apply(DF, 1, function(x) sapply(wordlist, function(y)
ifelse(is.na(match(y, x)), FALSE, TRUE))))
# asn bln gerp grn hgv hrn lwd mpl zwl
#[1,] TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE
#[2,] FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE TRUE
#[3,] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
#[4,] TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE
#[5,] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
或者获取匹配词的列名DF
t(apply(DF, 1, function(x) sapply(wordlist, function(y)
ifelse(match(y, x), paste0("X", match(y, x) - 1), NA))))
# asn bln gerp grn hgv hrn lwd mpl zwl
#[1,] "X4" "X5" "X2" "X1" NA "X3" NA NA NA
#[2,] NA "X2" NA NA "X3" NA NA "X4" "X5"
#[3,] NA NA NA NA NA NA "X3" "X2" "X1"
#[4,] "X2" "X1" "X4" "X5" NA "X3" NA NA NA
#[5,] NA NA NA NA NA NA "X1" "X2" "X3"
或者获取匹配词的列索引DF
t(apply(DF, 1, function(x) sapply(wordlist, function(y) match(y, x))))
# asn bln gerp grn hgv hrn lwd mpl zwl
#[1,] 5 6 3 2 NA 4 NA NA NA
#[2,] NA 3 NA NA 4 NA NA 5 6
#[3,] NA NA NA NA NA NA 4 3 2
#[4,] 3 2 5 6 NA 4 NA NA NA
#[5,] NA NA NA NA NA NA 2 3 4
推荐阅读
- javascript - Moment js time zone date conversion issue
- flask - Flask app AttributeError: 'function' object has no attribute 'post'
- android - Android - Seekbar - Customize seekbar UI
- python - How to sort objects and attributes by date?
- openstack - Why Overview volume's usage different Real Volume Usage?
- socket.io - Socket.io keeps sending message to all client even client that has not joined room
- c# - Merge 3 lists using LINQ
- spring - Spring Boot / Spring AOP:AutoProxyRegistrar.class 无法打开,因为它不存在
- javascript - 循环 jQuery 脚本以获取元素列表
- python - 限制python日志文件的大小