首页 > 解决方案 > How do I copy one or more different words to another dataframe

问题描述

I'm a student and new here.Im trying to do text analysis for my project. With the help of the community,I have successfully copy rows of data to another dataframe when a certain word appear in this sentence. But now, I would to copy rows of data to another dataframe when one and/or more words appear in this sentence.

*df1*
ID    Text

1     This apple is delicious and I like this apple a lot. But, I also like banana too. 
2     This orange is nice and sweet. 
3     This apple is too sweet and I would prefer orange. 
4     This apple is worth the price, definitely will purchase it again from this store. 
5     This pear is great! 
6     Best banana that I ever had! 
7     This pear was okay, but the apple is just worst. 

As you can see in ID 1, 3, 4, 6 and 7. There are words like apple, orange and banana. ID 1, there are words apple and banana. ID 2, only orange. ID 3, apple and orange. ID 4, only apple. ID 6, only banana. And lastly, ID 7, apple.

My objective is no matter how many times did the same word appear once or more than once or the words I have set only appear once or two of the words are in the sentence, it will copy that row of data to another dataframe.

Result that I want

*df2*
ID    Text

1     This apple is delicious and I like this apple a lot. But, I also like banana too. 
2     This orange is nice and sweet. 
3     This apple is too sweet and I would prefer orange. 
4     This apple is worth the price, definitely will purchase it again from this store. 
6     Best banana that I ever had! 
7     This pear was okay, but the apple is just worst. 

So no matter what, sentence contain apple and/or orange and/or banana or just one individual word, it will copy to another dataframe.

Thanks in advance!

标签: r

解决方案


You can create a vector of words which you want to check for in words. Create a pattern to look for with paste0 adding word boundaries and subset to get the rows in df2.

words <- c('apple', 'orange', 'banana')
df2 <- subset(df1, grepl(paste0('\\b', words, '\\b', collapse = "|"), Text))

In tidyverse you can do :

library(dplyr)
library(stringr)

df2 <- df1 %>% 
        filter(str_detect(Text, str_c('\\b', words, '\\b', collapse = "|")))

推荐阅读