首页 > 解决方案 > Topic modeling issues with R

问题描述

I am kind of new to sentiment analysis, and am having issues in finding topics for my text. I have found general sentiment but would like to find it based on the topics. I have cleared up the documents and made a DTM. Searching on the web i read LDA function should do what I asked for, and it kind of does but I always get duplicate results, like this:

lda<-LDA(dtm, 10)
terms(lda)
Topic 1      Topic 2      Topic 3      Topic 4      Topic 5      Topic 6      Topic 7 
"quality"    "quality"     "headphones"     "headphones"     "headphones" "microphone"    "quality" 
 Topic 8      Topic 9     Topic 10 
"microphone" "microphone"   "product" 

Also, I read somewhere that one topic should contain more words, how can I find those?

Thanks to everyone in advance

标签: r

解决方案


A topic is a distribution over a vocabulary, so, yes, each topic has probabilities for every word in your analyzed vocabulary. I believe that vocabulary is taken to be the union of all unique terms (or words).

The help file for topicmodels::topics has information on topicmodels::terms, too. You want to specify a second argument, k, to indicate the number of terms to return. Default is k=1, which is why you see only the top term per topic.

topicmodels::terms(lda, k = 5)

The above line should return the top 5 terms per topic.

When I first learned topic modeling, I found this resource helpful: https://www.tidytextmining.com/topicmodeling.html

I hope that my suggestions are helpful.


推荐阅读