r - 按包含的最大类数对句子进行分类
问题描述
我有以下句子:
sentences<-c("The color blue neutralizes orange yellow reflections.",
"Zod stabbed me with blue Kryptonite.",
"Because blue is your favourite colour.",
"Red is wrong, blue is right.",
"You and I are going to yellowstone.",
"Van Gogh looked for some yellow at sunset.",
"You ruined my beautiful green dress.",
"There's nothing wrong with green.")
我想根据以下类别进行分类:
A<-c("red")
B<-c("orange")
C<-c("yellow","yellowstone")
D<-c("blue")
E<-c("green")
这个任务的难点在于第一个句子,例如,可以被分类为D
, B
,C
那么结果分类应该是B+C+D
。第二句和第三句简直了D
。第四句是A
and D
then A+D
。第五句是C
等等。
解决方案
一dplyr
,purrr
可能性tibble
可能是:
map(lst, ~ grepl(paste(.x, collapse = "|"), sentences, ignore.case = TRUE)) %>%
transpose() %>%
map_chr(~ enframe(.x) %>%
summarise(name = paste(name[unlist(value)], collapse = ",")) %>%
pull(name))
[1] "B,C,D" "D" "D" "A,D" "C" "C" "E" "E"
与 lst 是:
lst <- list(A = c("red"),
B = c("orange"),
C = c("yellow","yellowstone"),
D = c("blue"),
E = c("green"))
推荐阅读
- python - 使用 pandas 制作列表数据框。每行将是原始列表中的一个列表,其标题与每个列表中的值键匹配
- dataset - 旧数据集图像到 excel 文件
- selenium - 使用 Selenium 切换到顶级框架
- java - Java中的内存类别“生成的代码”到底是什么?
- javascript - 未捕获的 ReferenceError:未定义 google - Google MAP API
- flutter - 如何在 Flutter 中逐行滚动段落?
- android - 蓝牙在尝试通过 BLE 做广告时不断崩溃
- swift - 迄今为止的 Swift 字符串
- ios - SwiftUI - 表单中的两个选择器:点击一个在第二个上传播
- sql-server - 将值插入另一个表中的特定行