首页 > 解决方案 > 针对 4 个列表检查 data.frame 列(字符)

问题描述

我想根据 4 个列表(a, b, c, )检查一个单词(在数据框中的列中d):

if df$word is in a then df$code <- 1
if df$word is in b then df$code <- 2
if df$word is in c then df$code <- 3
if df$word is in d then df$code <- 4

if df$word is in a & b then df$code <- 1 2
if df$word is in a & c then df$code <- 1 3
if df$word is in a & d then df$code <- 1 4
if df$word is in b & c then df$code <- 2 3
if df$word is in b & d then df$code <- 2 4
if df$word is in c & d then df$code <- 3 4

等等

最有效的方法是什么?

例子

df <- data.frame(word = c("book", "worm", "digital", "context"))

a <- c("book", "context")
b <- c("book", "worm", "context")
c <- c("digital", "worm", "context")
d <- c("context")

预期输出:

book    1 2
worm    2 3
digital 3
context 1 2 3 4

标签: rlistcharacter

解决方案


我们可以使用双sapply循环,对于数据框中的每个元素,我们检查which列表元素是否存在并获取相应的列表编号。

lst <- list(a, b, c, d)
df$output <- sapply(df$V1, function(x) paste0(which(sapply(lst, 
                           function(y) any(grepl(x,y)))), collapse = ","))

df
#       V1  output
#1    book     1,2
#2    worm     2,3
#3 digital       3
#4 context 1,2,3,4

数据

df <- read.table(text = "book
      worm
      digital
      context")

推荐阅读