r - 在 R 中映射评论的主题
问题描述
我有两个数据集,评论数据和主题数据
我的评论数据的输入代码
structure(list(Review = structure(2:1, .Label = c("Canteen Food could be improved",
"Sports and physical exercise need to be given importance"), class = "factor")), class = "data.frame", row.names = c(NA,
-2L))
我的主题数据的输入代码
structure(list(word = structure(2:1, .Label = c("canteen food",
"sports and physical"), class = "factor"), Topic = structure(2:1, .Label = c("Canteen",
"Sports "), class = "factor")), class = "data.frame", row.names = c(NA,
-2L))
我的期望输出的 Dput ,我想查找主题数据中出现的单词并将其映射到评论数据
structure(list(Review = structure(2:1, .Label = c("Canteen Food could be improved",
"Sports and physical exercise need to be given importance"), class = "factor"),
Topic = structure(2:1, .Label = c("Canteen", "Sports "), class = "factor")), class = "data.frame", row.names = c(NA,
-2L))
解决方案
这里是业余的。我使用 base R 而不是 dplyr 来做到这一点,因为我不是最擅长连接函数的。
下面,初始化你的 dfs。我添加了更多示例以确保一切正常。还选择不使用因子,这会使以后分配字符串变得混乱。
# initialize your dfs
review <- data.frame("Review" = c("Canteen Food could be improved",
"Sports and physical exercise need to be given importance",
"canteen food x2",
"this is my sports and physical",
"SPORTS AND PHYSICAL",
"meme",
"canteen and food",
"this is my meme",
"memethis"
),
stringsAsFactors = F)
topic <- data.frame("word" = c("canteen food", "sports and physical", "meme"),
"Topic" = c("Canteen", "Sports", "meme_cat"),
stringsAsFactors = F)
然后只需使用一些嵌套的 for 循环来遍历您想要的单词,找到匹配的字符串,并分配相关的主题。并在 for 循环之前初始化所有内容。
# initialize new column to write into in loop
review <- cbind(review, "Topic" = rep(NA, nrow(review)))
# initialize before for loop
a <- rep(F, nrow(topic))
# loop over words in topic and find string matches in review. if so, assign review$topic = Topic
for (i in 1:nrow(topic)) {
for(j in 1:nrow(review)) {
a[j] <- grepl(topic$word[i], review$Review[j], ignore.case=T)
}
if (any(a)) {
review$Topic[a] = topic$Topic[i]
}
review
# Review Topic
#1 Canteen Food could be improved Canteen
#2 Sports and physical exercise need to be given importance Sports
#3 canteen food x2 Canteen
#4 this is my sports and physical Sports
#5 SPORTS AND PHYSICAL Sports
#6 meme meme_cat
#7 canteen and food <NA>
#8 this is my meme meme_cat
#9 memethis meme_cat
推荐阅读
- ubuntu - 无法使用新的 Vagrant 安装 - 连接到 libvirt 时出错
- typescript - Typescript 函数参数作为集合的并集
和数组 - spring - SPRING JPA 延迟加载数据以在其他类中使用
- react-native - 如何在 React Native 项目中使用符号链接?
- css-position - 如何将图形位置设置为绝对?
- python - 我可以在 Python 中创建一个可构造的 SimpleNamespace 吗?
- sqlite - 在 shell 中运行 sqlite 命令
- python - How to get readable command outputs from terminal
- java - 为了相同的目的,我应该如何设计我的 restfull API 以通过不同格式(JSON、CSV)的 POST 接受数据?
- matlab - 在 Matlab 中产生 NaN 的梯度下降循环