marklogic - 根据 Marklogic 中的最大匹配词增加分数
问题描述
我想从字符串中搜索最多匹配单词的文档。更多匹配词的文档应该得分更高。示例我有字符串“今天是我的生日”我有如下文件
{
"Id": "Doc1",
"TitleName": "my birthday"
}
{
"Id": "Doc2",
"TitleName": "birthday"
}
{
"Id": "Doc3",
"TitleName": "Today is my teacher's birthday"
}
{
"Id": "Doc4",
"TitleName": "Holiday"
}
在这种情况下,Doc3 应该得到最高分,然后是 Doc1,Doc2
解决方案
That is what you get out of the box with the default scoring and relevancy based result sorting. If you supply a sequence of cts:word-query()
within a cts:or-query()
, the logtfidf
relevance calculation is applied and would return the documents in the order that you want.
log(tf)*idf Calculation
The
logtfidf
method of relevance calculation is the default relevance calculation, and it is the option score-logtfidf ofcts:search
. The logtfidf method takes into account term frequency (how often a term occurs in a single fragment) and document frequency (in how many documents does the term occur) when calculating the score. Most search engines use a relevance formula that is derived by some computation that takes into account term frequency and document frequency.
The
logtfidf
method (the default scoring method) uses the following formula to calculate relevance:
log(term frequency) * (inverse document frequency)
So, with this search:
let $phrase := "Today is my birthday"
let $word-queries := tokenize($phrase, " ") ! cts:word-query(.)
return
cts:search(doc(), cts:or-query($word-queries ))
It returns the documents in the order: Doc3, Doc1, Doc2
推荐阅读
- javascript - 为什么我不能使用和箭头函数生成 {object}?
- python - 自定义 Django 管理命令
- scroll - Google AppMaker 中的弹出式滚动问题
- spring - Hibernate 不会在重复时抛出异常
- javascript - 渲染道具和反应路由器
- php - 如何从数据表服务器端处理中接收表格单元格中的 ID?
- php - 如何使用 phpmailer 类发送邮件?
- url - 电报机器人 setChatPermissions 方法不起作用
- swift - 我想设置每日提醒(本地通知)以及用户输入按钮后 10 秒的提醒
- ssl - 为多个 Kubernetes 节点生成单个证书