首页 > 解决方案 > R函数用单词向量过滤句子向量

问题描述

我一直在尝试从向量中提取句子。附上格式正确的图片。先感谢您!

例如:

Vector1    Vector2
One        One day, it was sunny| There was no rain| There was One dollar on the floor
Two        Two day, it was rainy| There was no sun
Three      There was Three dollars on the floor| It was wet| Three of ants on floor|

回答:

Key        Sentence1                              Sentence2                           Sentence3
One        One day, it was sunny                  There was One dollar on the floor
Two        Two day, it was rainy
Three      There was Three dollars on the floor   Three of ants on floor

在此处输入图像描述

标签: rvector

解决方案


Vector2您可以通过拆分来获取长格式的数据"|",仅保留其中存在的那些行Vector1并获取宽格式的数据。

library(dplyr)
library(tidyr)

df %>%
  separate_rows(Vector2, sep = '\\|\\s*') %>%
  filter(stringr::str_detect(Vector2, paste0('\\b', Vector1, '\\b'))) %>%
  group_by(Vector1) %>%
  mutate(col = paste0('Sentence', row_number())) %>%
  pivot_wider(names_from = col, values_from = Vector2)

# Vector1 Sentence1                            Sentence2                        
#  <chr>   <chr>                                <chr>                            
#1 One     One day, it was sunny                There was One dollar on the floor
#2 Two     Two day, it was rainy                NA                               
#3 Three   There was Three dollars on the floor Three of ants on floor       

数据

df <- structure(list(Vector1 = c("One", "Two", "Three"), 
Vector2 = c("One day, it was sunny| There was no rain| There was One dollar on the floor",
"Two day, it was rainy| There was no sun", 
"There was Three dollars on the floor| It was wet| Three of ants on floor"
)), class = "data.frame", row.names = c(NA, -3L))

推荐阅读