首页 > 解决方案 > 在 R 中进行文本分析,但我无法找到删除“s”或其他缩写的方法

问题描述

我正在尝试从我的文本数据中删除 S 和其他缩写字母。我正在使用的当前预处理代码是这样的:

x<- (demtweets$Tweet)
x <- paste(unlist(x), collapse =" ")
x <- stringi::stri_trans_general(x, "latin-ascii")

x<- gsub(" '[A-z] ", " ", x)
x <- gsub("&amp", "", x)
x <- gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "", x)
x <- gsub("@\\w+", "", x)
x <- gsub("[[:punct:]]", "", x)
x <- gsub("[[:digit:]]", "", x)
x <- gsub("http\\w+", "", x)
x <- gsub("[ \t]{2,}", "", x)
x <- gsub("^\\s+|\\s+$", "", x) 
x <- replace_contraction(x, contraction.key = lexicon::key_contractions,
                     ignore.case = TRUE)
x <- replace_contraction(x,
                contraction = qdapDictionaries::contractions, replace = NULL,
                ignore.case = TRUE)

xdfm <-dfm(x, stem = F, remove_punct = T, tolower = T, remove_twitter = T, remove_numbers = TRUE, remove = c(stopwords("english"), "http","https","rt", "t.co"))


  textplot_wordcloud(xdfm, min_count = 6, random_order = FALSE,
               rotation = .25,
               color = RColorBrewer::brewer.pal(8, "Dark2"))
topfeatures(xdfm, 100)

`

这些命令似乎都没有解决这个问题。有什么帮助吗?

标签: rnlp

解决方案


推荐阅读