首页 > 解决方案 > 按包含的最大类数对句子进行分类

问题描述

我有以下句子:

sentences<-c("The color blue neutralizes orange yellow reflections.", 
         "Zod stabbed me with blue Kryptonite.", 
         "Because blue is your favourite colour.",
         "Red is wrong, blue is right.",
         "You and I are going to yellowstone.",
         "Van Gogh looked for some yellow at sunset.",
         "You ruined my beautiful green dress.",
         "There's nothing wrong with green.")

我想根据以下类别进行分类:

A<-c("red")
B<-c("orange")
C<-c("yellow","yellowstone")
D<-c("blue")
E<-c("green")

这个任务的难点在于第一个句子,例如,可以被分类为D, BC那么结果分类应该是B+C+D。第二句和第三句简直了D。第四句是Aand Dthen A+D。第五句是C等等。

标签: rstring

解决方案


dplyrpurrr可能性tibble可能是:

map(lst, ~ grepl(paste(.x, collapse = "|"), sentences, ignore.case = TRUE)) %>%
 transpose() %>%
 map_chr(~ enframe(.x) %>%
          summarise(name = paste(name[unlist(value)], collapse = ",")) %>%
          pull(name))

[1] "B,C,D" "D"     "D"     "A,D"   "C"     "C"     "E"     "E"    

与 lst 是:

lst <- list(A = c("red"),
B = c("orange"),
C = c("yellow","yellowstone"),
D = c("blue"),
E = c("green"))

推荐阅读