r - Match names with R for each element of dataframe
问题描述
I have 2 data frames:
data1 <- data.frame(names = c("ALBERT | ALBERTIS 2",
"PIERRE | JEAN | ALBERT",
"ALBERTOS"))
data2 <- data.frame(names_search = c("ALBERT", "PIERRE"))
I want to know that each whole WORD of data2
is present in data1
. A new column in data1
will contain those elements matched.
So I want a result like:
data3 <- data.frame(names = c("ALBERT | ALBERTOS | ALBERT 2",
"ALBERT | ALBERTOS | ALBE 2",
"PIERRE | PIERRE 2 | PIERRE_SECOND | PIERRE_SECOND 2"),
names_search = c("ALBERT", "ALBERT | PIERRE", ""))
Do you have any idea how to do this?
I tried this in double loop (hope you can give a better way) but it failed.
for( i in 1:nrow(data1)){
result <- ""
for(j in 1: nrow(data2)){
present <- grepl(eval(parse(text = paste0('\\<',data2$names_search[j],'\\>'))), data1$names[i], fixed = T)
# I check if the whole word data[j] is present in data1[i]
if(present ==T){
result <- paste(result, data2$names_search[j], sep= "|")
}
}
data1$names_search[i] <- result
}
解决方案
" | "
我们可以使用strsplit
;分割字符串(即每一行)data2
此后,我们只需使用if it's %in%
side的匹配向量对每次迭代进行子集化。最后,if
处理不匹配的情况,else
paste
将结果转换为所需的形式。
data1 <- transform(
data1,
names_search=sapply(strsplit(as.character(data1$names), " | ", fixed=TRUE), function(x) {
out <- x[x %in% data2$names_search]
if (length(out) == 0) NA_character_
else paste(out, collapse=" | ")
}))
结果
data1
# names names_search
# 1 ALBERT | ALBERTIS 2 ALBERTIS 2
# 2 PIERRE | JEAN | ALBERT PIERRE
# 3 ALBERTOS <NA>
数据
data1 <- structure(list(names = structure(c(1L, 3L, 2L), .Label = c("ALBERT | ALBERTIS 2",
"ALBERTOS", "PIERRE | JEAN | ALBERT"), class = "factor")), class = "data.frame", row.names = c(NA,
-3L))
data2 <- structure(list(names_search = structure(1:2, .Label = c("ALBERTIS 2",
"PIERRE"), class = "factor")), class = "data.frame", row.names = c(NA,
-2L))
推荐阅读
- python - “无法加载插件:%s:%s”%(self.group,名称)sqlalchemy.exc.NoSuchModuleError:无法加载插件:sqlalchemy.dialects:postgres我该怎么做
- elixir - 什么是使用捕获语法传递的内核函数?
- bootstrap-4 - 带有 Bootstrap 的 Angular 编译器警告
- asp.net-core - Blazor 服务器端 => 没有身份脚手架的身份验证可能吗?
- typescript - 如何协调 monorepo 与多个 tsconfig.json,每个 tsconfig.json 都有自己的路径?
- java - 令人困惑的结果测试两个线程使用同步方法递增单个 int
- php - 请求过滤器时获取密钥
- spring-boot - 使用 Kotlin 和 Spring Boot 配置 DbSetup
- azure - 从 Azure ADDS 到 On Prem 的一种信任方式失败
- class - 是否可以使用接口部分的实现部分中声明的类