r - How do I copy one or more different words to another dataframe
问题描述
I'm a student and new here.Im trying to do text analysis for my project. With the help of the community,I have successfully copy rows of data to another dataframe when a certain word appear in this sentence. But now, I would to copy rows of data to another dataframe when one and/or more words appear in this sentence.
*df1*
ID Text
1 This apple is delicious and I like this apple a lot. But, I also like banana too.
2 This orange is nice and sweet.
3 This apple is too sweet and I would prefer orange.
4 This apple is worth the price, definitely will purchase it again from this store.
5 This pear is great!
6 Best banana that I ever had!
7 This pear was okay, but the apple is just worst.
As you can see in ID 1, 3, 4, 6 and 7. There are words like apple, orange and banana. ID 1, there are words apple and banana. ID 2, only orange. ID 3, apple and orange. ID 4, only apple. ID 6, only banana. And lastly, ID 7, apple.
My objective is no matter how many times did the same word appear once or more than once or the words I have set only appear once or two of the words are in the sentence, it will copy that row of data to another dataframe.
Result that I want
*df2*
ID Text
1 This apple is delicious and I like this apple a lot. But, I also like banana too.
2 This orange is nice and sweet.
3 This apple is too sweet and I would prefer orange.
4 This apple is worth the price, definitely will purchase it again from this store.
6 Best banana that I ever had!
7 This pear was okay, but the apple is just worst.
So no matter what, sentence contain apple and/or orange and/or banana or just one individual word, it will copy to another dataframe.
Thanks in advance!
解决方案
You can create a vector of words which you want to check for in words
. Create a pattern to look for with paste0
adding word boundaries and subset
to get the rows in df2
.
words <- c('apple', 'orange', 'banana')
df2 <- subset(df1, grepl(paste0('\\b', words, '\\b', collapse = "|"), Text))
In tidyverse
you can do :
library(dplyr)
library(stringr)
df2 <- df1 %>%
filter(str_detect(Text, str_c('\\b', words, '\\b', collapse = "|")))
推荐阅读
- python - 获取熊猫列表列中元素频率的有效方法
- java - 使用 PowerMockito 测试静态方法时出现 MissingMethodInvocationException
- python - 在字符串中查找短语
- linux - 在本地 Windows 机器上运行 postgres 命令时出现问题。在 Linux、Mac 中运行成功,但在 windows 中失败
- postgresql - PostgreSQL 密码认证失败
- shell - Tcsh中的“自动更正”参数?
- react-native - firebase firestore 在 .get() 方法后使应用程序崩溃
- fortran - 使用 g95 出错:无法确定命名可执行文件的标志
- c# - 如何从 C# 中的数据集/数据表映射变量?
- python-3.x - 从文件中获取输入到 paramiko