r - 获取模式匹配的 id
问题描述
我想提取引理 GO 的搭配。
df <- data.frame(
id = 1:6,
go = c("go after it", "here we go", "he went bust", "go get it go",
"i 'm gon na go", "she 's going berserk"))
我可以像这样提取搭配:
# lemma forms:
lemma_GO <- c("go", "goes", "going", "gone", "went", "gon na")
# alternation pattern:
pattern_GO <- paste0("\\b(", paste0(lemma_GO, collapse = "|"), ")\\b")
# extraction:
library(stringr)
df_GO <- data.frame(
left = unlist(str_extract_all(df$go, paste0("('?\\b[a-z']+\\b|^)(?=\\s?", pattern_GO, ")"))),
node = unlist(str_extract_all(df$go, pattern_GO)),
right = unlist(str_extract_all(df$go, paste0("(?<=\\s?", pattern_GO, "\\s?)('?\\b[a-z']+\\b|$)")))
)
结果很好,但它没有显示id
值,即我不知道匹配项是从哪个“句子”中提取的:
df_GO
left node right
1 go after
2 we go
3 he went bust
4 go get
5 it go
6 'm gon na go
7 na go
8 's going berserk
如何id
获取值以便结果如下:
df_GO
left node right id
1 go after 1
2 we go 2
3 he went bust 3
4 go get 4
5 it go 4
6 'm gon na go 5
7 na go 5
8 's going berserk 6
解决方案
你快到了。您需要做的是循环/迭代您的数据帧并对每一行执行操作。这也允许您提取和存储 id。
为此,我们将您的步骤包装到函数调用中并将 id 添加到其中。
以下使用tidyverse
包,特别是{purrr}
用于迭代。
library(tidyverse)
# wrap your call into a function that we perform on each row
extract_GO <- function(df_row){
df_GO <- data.frame(
id = df_row$id, # we also store the id for the row we process
#---------------------- your work - just adapted the variable to function call, df_row
## this could have stayed the same, but this way it is easier to understand
## what happens here
left = unlist(str_extract_all(df_row$go, paste0("('?\\b[a-z']+\\b|^)(?=\\s?", pattern_GO, ")"))),
node = unlist(str_extract_all(df_row$go, pattern_GO)),
right = unlist(str_extract_all(df_row$go, paste0("(?<=\\s?", pattern_GO, "\\s?)('?\\b[a-z']+\\b|$)")))
)
}
# --------------- next we iterate with purrr
## try df %>% group_split(id) to see what group_split() does
df %>%
group_split(id) %>% # splits data frame into list of bins, i.e. by id
purrr::map_dfr(.x, .f = ~ extract_GO(.x)) # now we iterate over bins with our function
这产生:
id left node right
1 1 go after
2 2 we go
3 3 he went bust
4 4 go get
5 4 it go
6 5 'm gon na go
7 5 na go
8 6 's going berserk
推荐阅读
- flutter - Flutter:使用初始值文本时如何隐藏TextField文本指针(光标)(Android)
- kubernetes - 如何将 Istio 特使代理配置为 Outbound 装饰器
- javascript - 如何在浏览器中使用 BASE64 键打开 PDF
- django - 从颤振发布后,获取代码 400 消息错误的请求语法
- node.js - Node.js:如何在自定义测试运行器中导入测试文件
- php - 在数据库中搜索而不考虑 MySql 和 Laravel 中的 HTML 标签
- preg-replace - 如何使用 preg_replace 删除空格 + hiphen + 空格以仅在 url slug 中使用 hiphen
- php - 如何调试这个 PHP 联系表单发件人?
- android - 如何修复 Android 导航图标大小问题
- c++ - int 函数返回值也没有返回