r - R函数用单词向量过滤句子向量
问题描述
我一直在尝试从向量中提取句子。附上格式正确的图片。先感谢您!
例如:
Vector1 Vector2
One One day, it was sunny| There was no rain| There was One dollar on the floor
Two Two day, it was rainy| There was no sun
Three There was Three dollars on the floor| It was wet| Three of ants on floor|
回答:
Key Sentence1 Sentence2 Sentence3
One One day, it was sunny There was One dollar on the floor
Two Two day, it was rainy
Three There was Three dollars on the floor Three of ants on floor
解决方案
Vector2
您可以通过拆分来获取长格式的数据"|"
,仅保留其中存在的那些行Vector1
并获取宽格式的数据。
library(dplyr)
library(tidyr)
df %>%
separate_rows(Vector2, sep = '\\|\\s*') %>%
filter(stringr::str_detect(Vector2, paste0('\\b', Vector1, '\\b'))) %>%
group_by(Vector1) %>%
mutate(col = paste0('Sentence', row_number())) %>%
pivot_wider(names_from = col, values_from = Vector2)
# Vector1 Sentence1 Sentence2
# <chr> <chr> <chr>
#1 One One day, it was sunny There was One dollar on the floor
#2 Two Two day, it was rainy NA
#3 Three There was Three dollars on the floor Three of ants on floor
数据
df <- structure(list(Vector1 = c("One", "Two", "Three"),
Vector2 = c("One day, it was sunny| There was no rain| There was One dollar on the floor",
"Two day, it was rainy| There was no sun",
"There was Three dollars on the floor| It was wet| Three of ants on floor"
)), class = "data.frame", row.names = c(NA, -3L))
推荐阅读
- c# - 从特定项目运行迁移
- docker - 如何在 Amazon ECS 上找到服务地址 (URL) 和/或对 Mailtrain 实例进行故障排除?
- c++ - 用细胞表示迷宫
- regex - 匹配 0 和 1 的正则表达式,其中连续的 1 必须是奇数?
- c - 在汇编中调用 printf() 会导致“浮点异常”
- javascript - 字符串分割图组合的时间复杂度
- javascript - Javascript:过滤多个值的对象数组
- php - 如何在其他 php 文件中使用 conexion 类?
- c++ - C++:通过函数参数传递的值给出不同的结果
- javascript - 如何为tone.js中正在播放的音符添加事件监听器