r - Topic modeling issues with R
问题描述
I am kind of new to sentiment analysis, and am having issues in finding topics for my text. I have found general sentiment but would like to find it based on the topics. I have cleared up the documents and made a DTM. Searching on the web i read LDA function should do what I asked for, and it kind of does but I always get duplicate results, like this:
lda<-LDA(dtm, 10)
terms(lda)
Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 6 Topic 7
"quality" "quality" "headphones" "headphones" "headphones" "microphone" "quality"
Topic 8 Topic 9 Topic 10
"microphone" "microphone" "product"
Also, I read somewhere that one topic should contain more words, how can I find those?
Thanks to everyone in advance
解决方案
A topic is a distribution over a vocabulary, so, yes, each topic has probabilities for every word in your analyzed vocabulary. I believe that vocabulary is taken to be the union of all unique terms (or words).
The help file for topicmodels::topics
has information on topicmodels::terms
, too. You want to specify a second argument, k
, to indicate the number of terms to return. Default is k=1
, which is why you see only the top term per topic.
topicmodels::terms(lda, k = 5)
The above line should return the top 5 terms per topic.
When I first learned topic modeling, I found this resource helpful: https://www.tidytextmining.com/topicmodeling.html
I hope that my suggestions are helpful.
推荐阅读
- android - 如何从android studio代码中删除文件
- java - 在没有任何特殊数据结构的情况下计算移动平均线
- nativescript - Android 的 RadListView scrollPosition 属性
- angularjs - 绑定到服务变量的 AngularJS 1.6.9 控制器变量不会改变
- mysql - 数据库日期查询
- vue.js - Vue:让子组件意识到由其父组件修改的属性的变化
- chart.js - 当条形图低于零时条形图无法正确显示(chart.js)
- python - Python 通用查询 - Dataquest.io
- c# - 当不可变集合比并发更可取时
- excel - 在 Excel 中更改 WeekStart 的输出语言