r - 使用R中的stringi提取字符串中某些字符之后的多个子字符串
问题描述
我在 R 中有一个大型数据框,其中有一列看起来像这样,其中每个句子都是一行
data <- data.frame(
datalist = c("anarchism is a wiki/political_philosophy that advocates wiki/self-governance societies based on voluntary institutions",
"these are often described as wiki/stateless_society although several authors have defined them more specifically as institutions based on non- wiki/hierarchy or wiki/free_association_(communism_and_anarchism)",
"anarchism holds the wiki/state_(polity) to be undesirable unnecessary and harmful",
"while wiki/anti-statism is central anarchism specifically entails opposing authority or hierarchical organisation in the conduct of all human relations"),
stringsAsFactors=FALSE)
我想提取“wiki/”之后的所有单词并将它们放在另一列中
所以第一行应该是“political_philosophy self-governance”第二行应该是“hierarchy free_association_(communism_and_anarchism)”第三行应该是“state_(polity)”第四行应该是“anti-statism "
我绝对想使用 stringi,因为它是一个巨大的数据框。提前感谢您的帮助。
我试过了
stri_extract_all_fixed(data$datalist, "wiki")[[1]]
但这只是提取单词 wiki
解决方案
您可以使用正则表达式执行此操作。通过使用stri_match_
而不是,stri_extract_
我们可以使用括号来创建匹配组,让我们只提取正则表达式匹配的一部分。在下面的结果中,您可以看到 的每一行都df
给出了一个列表项,其中包含一个数据框,其中第一列中的整个匹配项和以下列中的每个匹配组:
match <- stri_match_all_regex(df$datalist, "wiki/([\\w-()]*)")
match
[[1]]
[,1] [,2]
[1,] "wiki/political_philosophy" "political_philosophy"
[2,] "wiki/self-governance" "self-governance"
[[2]]
[,1] [,2]
[1,] "wiki/stateless_society" "stateless_society"
[2,] "wiki/hierarchy" "hierarchy"
[3,] "wiki/free_association_(communism_and_anarchism)" "free_association_(communism_and_anarchism)"
[[3]]
[,1] [,2]
[1,] "wiki/state_(polity)" "state_(polity)"
[[4]]
[,1] [,2]
[1,] "wiki/anti-statism" "anti-statism"
然后,您可以使用 apply 函数将数据转换为您想要的任何形式:
match <- stri_match_all_regex(df$datalist, "wiki/([\\w-()]*)")
sapply(match, function(x) paste(x[,2], collapse = " "))
[1] "political_philosophy self-governance"
[2] "stateless_society hierarchy free_association_(communism_and_anarchism)"
[3] "state_(polity)"
[4] "anti-statism"
推荐阅读
- python - 找不到用于动态路由的 Flask CSS
- mongodb - 在mongodb中查询数组
- react-native - 反应本机应用程序无法请求许可 googlefit
- java - 如何将唯一的电子邮件插入数据库
- android - 当应用程序进入后台时 MediaPlayer 音频播放停止 (OREO)
- ember.js - 在 GitLab CI/CD 上成功部署后通知刷新 ember.js 应用程序
- java - JavaFX 简单更新标签(线程)
- php - 如何在php中获取准确的时区
- c# - 如何在 Razor 页面视图中显示角色列表?
- asp.net - 动态更改 PasswordValidator 设置 Asp.net Mvc