solr - How do identify why certain documents are returned in SOLR response that may not be relevant for given keyword search?
问题描述
We see certain documents in WCS SOLR response that are not relevant for given search keyword, but are part of the current customer/site catalog and categories. I do see SOLR debugging information, parser queries, filters, etc.. but trying to know if it's possible to know why a document was included in the response. I do see explain string and haven't spent time understanding whole algorithm behind it, would like to see if there is a quick way of knowing why the document was in the result/response list. This may help identifying catalog/categories data structure issue or any bugs in our SOLR implementation.
Is it possible to see the debug information at each returned document level in the response, if that helps understanding how SOLR is configured and working in the environment?
Thanks,
解决方案
When you pass debugQuery=true
in the request you can see in the debug/explain
node of the response the reason for the score of each document. The information will look more or less like this:
...
"debug": {
...
"explain": {
"id:1": "info about the score for document 1",
"id:2": "info about the score for document 2",
"id:3": "info about the score for document 2",
"id:4": "info about the score for document 4",
...
}
}
The information is not exactly easy to parse and decipher but it might be a good place to start.
I explain in this blog post more about how to read the information in the explain
section: https://library.brown.edu/DigitalTechnologies/understanding-scoring-of-documents-in-solr/
推荐阅读
- java - 该算法的时间复杂度:是 O(n^2) 还是 O(n)
- regex - 将 PowerShell 正则表达式中的字符串转义为常规字符串
- css - Flex 项目不在容器内居中?
- azure-functions - 2.0 版函数中的 HttpResponseMessage 支持
- r - R: purrr: 使用 pmap 进行逐行操作,但这次涉及很多列
- apache-kafka - Kafka 主题未创建为空
- apache - 如何在 PerPostConfigRequire 执行的代码中获取“我的”VHOST?
- node.js - 从 iOS 上传图片太慢
- python - 通过对组的聚合替换列的值
- laravel - 如何响应 message() 请求