首页 > 解决方案 > Why can't R read the text file

问题描述

Try to get R read my text file and do a text mining, but following the steps it's not working, don't know what's wrong. Someone plz help me

library(tm)
setwd("E://")
path="E:/KEYWORDS"
text<-readLines("KEYWORDS.txt")
corpus<- Corpus(VectorSource(text))
corpus<- tm_map(corpus,tolower)
corpus<- tm_map(corpus,removePunctuation)
corpus<-tm_map(corpus,stripWhitespace)
corpus<-Corpus(VectorSource(corpus))
tdm =TermDocumentMatrix(corpus,PlainTextDocument)
findFreTerms(tdm,lowfreq=2)

And it shows:

Warning message:
In tm_map.SimpleCorpus(corpus, removePunctuation) :
transformation drops documents
tdm =TermDocumentMatrix(corpus,PlainTextDocument)
Error: is.list(control) is not TRUE

And if you do this

str(readLines("KEYWORDS.txt"))
paste(str(readLines("KEYWORDS.txt")),collapse=" ")
text<-paste(str(readLines("KEYWORDS.txt")),collapse=" ")
gsub(pattern="//W", replace="  ", text)
text<-gsub(pattern="//W",replace=" ",text)
gsub(pattern="//d", replace=" ", text)
text<-gsub(pattern="//d", replace=" ", text1)
tolower(text)
text<-tolower(text)
text

It shows the text is null or contains 0 characters why?

标签: rtext-mining

解决方案


tdm =TermDocumentMatrix(corpus,PlainTextDocument)
Error: is.list(control) is not TRUE

that's because you've given the second parameter to TermDocumentMatrix as PlainTextDocument rather than a list of control arguments. Read the documentation for TermDocumentMatrix to see what is a valid set of control arguments.

You say you are doing this by "following the steps" but you should understand the steps first.


推荐阅读