tf-idf - How to know if your TF-IDF calculation is correct?
问题描述
Background Info: I'm just getting started to learn NLP, have gone through materials for basic CS course, watched some videos and read a bit...
My Approach: Use specific technique(s) learned, and to write my own code (not to use tools yet), to test against some texts, then, see if I get them right or almost right.
My Challenge: Not knowing where appropriate to post questions for my learning and get informative responses.
The first technique I'm learning is TF-IDF, goal is to extract the most important and central ideas of a text.
Text URL, https://news.yahoo.com/oxford-scientists-working-coronavirus-vaccine-100847254.html
Using TF, I get the following top "concepts" or "words" for this text (with respective scores):
oxford, 7
scientists, 6
coronavirus, 4
vaccine, 14
falling, 5
adam, 3
With a slightly different calculation formula, I get the following top "words":
scientists
coronavirus
vaccine
uk
falling
bell
might ( this one is obviously false positive, but I don't want to add it to stop word list since its noun form would be meaningful in another context)
Using TF-IDF, I get the following three top "sentences": 22.532130774077 17.891164215124 16.190527222538
(1) Oxford scientists working on a coronavirus vaccine say the chances of success are now 50%
(2) " Scientists at Oxford are working with global pharmaceutical company AstraZeneca Plc to produce the vaccine
(3) They say that's because the number of people with the virus in the UK is falling too quickly
Their respective scores:
22.532130774077
17.891164215124
16.190527222538
Could anyone run some tests against this text and see what results you get? By comparing my results with that of expert I would know know how I'm doing.
Thanks in advance.
解决方案
推荐阅读
- html - 在表单中排列输入
- php - PHP - 如果达到会话超时,则阻止用户在网站中的操作
- javascript - Zapier 代码:触发多个 webhook
- javascript - 滚动指示器
- jquery - 可内容编辑的 DIV 中带有换行符的 JQuery 字符计数器
- android - Android Firebase - 神秘的对象键
- c# - 方法没有重载匹配委托'EventHandler'
- python-3.x - 当我尝试将 BGR 转换为 HSV 并得到错误 cvtColor 时出了什么问题?
- youtube - YouTube 和 Netflix 如何将内容推送到 Roku?
- node.js - Mongoose - 如何在 Mongoose 的填充字段数组中选择特定对象?