r - R中的Wordcloud包括字符“和”，我该如何删除它们？

问题描述

我试图从福尔摩斯的故事中制作一个词云，问题是最重要的词是“和”。

我不能用带有属性的tm_map函数来删除它们。removeWords我试过的是这样的：

docs <- tm_map(docs, removeWords, c('“'))

标签： rword-cloud

您可以使用removePunctuation包中的功能tm。

library(tm)
library(janeaustenr)

# With Punctuation
data("prideprejudice")
prideprejudice[30]

# Punctuation Removed
prideprejudice <- removePunctuation(prideprejudice)
prideprejudice[30]

您也可以使用该tidytext软件包。该unnest_tokens功能将自动去除标点符号。您可能还想摆脱停用词，您可以这样做：

library(tm)
library(tidytext)
library(janeaustenr)
library(dplyr)

data("prideprejudice")
data(stop_words)

prideprej_tibble <- tibble(text=prideprejudice)

prideprej_words <- prideprej_tibble %>%
  unnest_tokens(word, text) %>%
  anti_join(stop_words)

请参阅此处了解更多信息。

r - R中的Wordcloud包括字符“和”，我该如何删除它们？

问题描述

解决方案

推荐阅读