首页 > 解决方案 > 根据 Marklogic 中的最大匹配词增加分数

问题描述

我想从字符串中搜索最多匹配单词的文档。更多匹配词的文档应该得分更高。示例我有字符串“今天是我的生日”我有如下文件

{
  "Id": "Doc1",
  "TitleName": "my birthday"
}
{
  "Id": "Doc2",
  "TitleName": "birthday"
}
{
  "Id": "Doc3",
  "TitleName": "Today is my teacher's birthday"
}
{
  "Id": "Doc4",
  "TitleName": "Holiday"
}

在这种情况下,Doc3 应该得到最高分,然后是 Doc1,Doc2

标签: marklogic

解决方案


That is what you get out of the box with the default scoring and relevancy based result sorting. If you supply a sequence of cts:word-query() within a cts:or-query(), the logtfidf relevance calculation is applied and would return the documents in the order that you want.

log(tf)*idf Calculation

The logtfidf method of relevance calculation is the default relevance calculation, and it is the option score-logtfidf of cts:search. The logtfidf method takes into account term frequency (how often a term occurs in a single fragment) and document frequency (in how many documents does the term occur) when calculating the score. Most search engines use a relevance formula that is derived by some computation that takes into account term frequency and document frequency.

The logtfidf method (the default scoring method) uses the following formula to calculate relevance:

log(term frequency) * (inverse document frequency)

So, with this search:

let $phrase := "Today is my birthday"
let $word-queries := tokenize($phrase, " ") ! cts:word-query(.)
return
  cts:search(doc(), cts:or-query($word-queries ))

It returns the documents in the order: Doc3, Doc1, Doc2


推荐阅读