azure-cognitive-search - 为 Azure 搜索选择正确的分析器
问题描述
我们在 Azure 搜索服务中创建了如下索引:
"analyzers": [
{
"@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "SWMLuceneAlongWithCustomHyphenAnalyser",
"tokenizer": "keyword_v2",
"tokenFilters": [
"lowercase"
],
"charFilters": []
}
该分析器被分配给一个名为“lowerMachineTag”的属性。现在,当我们使用以下查询进行搜索时,我们会得到预期的结果:
询问:search=lowerSystemID:/.*it\'s.*/lowerMachineTag:/.*it\'s.*/&$filter=(systemID%20ne%20null)%20and%20(ownerSalesforceRecordID%20eq%20'a0h5B000000gJKfQAM')&$count=true&$top=100&$skip=0
结果:
{
"@odata.context": "https://abcd/indexes('orders-index')/$metadata#docs",
"@odata.count": 4,
"value": [
{
"@search.score": 0.1862714,
"systemID": "*1QXEDL8E2V8MGBY",
"machineTag": "It's me",
"systemIDMachineTag": "*1QXEDL8E2V8MGBY|It's me",
"machineTagSystemID": "It's me|*1QXEDL8E2V8MGBY",
"lowerMachineTag": "it's me",
"lowerSystemID": "*1qxedl8e2v8mgby",
"ownerSalesforceRecordID": "a0h5B000000gJKfQAM",
"parentSalesforceRecordID": "a0h5B000000gJKfQAM"
},
{
"@search.score": 0.16417237,
"systemID": "*1QXEDL8E2V8MGBY",
"machineTag": "It's me",
"systemIDMachineTag": "*1QXEDL8E2V8MGBY|It's me",
"machineTagSystemID": "It's me|*1QXEDL8E2V8MGBY",
"lowerMachineTag": "it's me",
"lowerSystemID": "*1qxedl8e2v8mgby",
"ownerSalesforceRecordID": "a0h5B000000gJKfQAM",
"parentSalesforceRecordID": "a0h5B000000gJKfQAM"
},
{
"@search.score": 0.16417237,
"systemID": "*1QXEDL8E2V8MGBY",
"machineTag": "It's me",
"systemIDMachineTag": "*1QXEDL8E2V8MGBY|It's me",
"machineTagSystemID": "It's me|*1QXEDL8E2V8MGBY",
"lowerMachineTag": "it's me",
"lowerSystemID": "*1qxedl8e2v8mgby",
"ownerSalesforceRecordID": "a0h5B000000gJKfQAM",
"parentSalesforceRecordID": "a0h5B000000gJKfQAM"
},
{
"@search.score": 0.16417237,
"systemID": "*1QXEDL8E2V8MGBY",
"machineTag": "It's me",
"systemIDMachineTag": "*1QXEDL8E2V8MGBY|It's me",
"machineTagSystemID": "It's me|*1QXEDL8E2V8MGBY",
"lowerMachineTag": "it's me",
"lowerSystemID": "*1qxedl8e2v8mgby",
"ownerSalesforceRecordID": "a0h5B000000gJKfQAM",
"parentSalesforceRecordID": "a0h5B000000gJKfQAM"
}
]
}
但是,如果我们在搜索 lowerMachineTag:/ 时也应该返回结果,那么分析器配置的一般建议是什么。它。/ 除了上述行为
解决方案
看来您在搜索查询中使用了正则表达式——要使其正常工作,您还必须在查询字符串中添加“<strong>&queryType=full”。否则,整个搜索词(“<em>lowerSystemID:/.*it\'s.*/lowerMachineTag:/.*it\'s.*/”)将被理解为一个简单的查询,这意味着它会被分析使用标准分析器并匹配任何可搜索的字段。通过添加“<strong>&queryType=full”,您的正则表达式将被理解为仅与指定字段匹配。
根据您的问题,如果指定了“<strong>lowerMachineTag:/.it./”,它将与上述四个文档中的任何一个都不匹配,如“。” 在正则表达式的开头将尝试匹配“it”字符之前的字符,并且至少在四个文档中,“lowerMachineTag”的值始终以“it”开头。
如果要删除开头的“。” 字符,仅使用“<strong>lowerMachineTag:/it./”,它仍然不匹配,因为正则表达式必须匹配整个标记(添加 '<em>' 会起作用:“lowerMachineTag:/it./”) .
您还可以使用nGram_v2 token filter更改分析器定义以使“/it./”工作,如下所示:
"analyzers": [
{
"@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "SWMLuceneAlongWithCustomHyphenAnalyser",
"tokenizer": "keyword_v2",
"tokenFilters": [
"lowercase", “myNGramTokenFilter”
],
"charFilters": []
},
"tokenFilters":[
{
"name":"myNGramTokenFilter",
"@odata.type":"Microsoft.Azure.Search.NGramTokenFilterV2",
"minGram":1,
"maxGram":100
}
]
这仍然会使您的原始查询(+“queryType=full”)返回相同的结果,并且在使用“lowerMachineTag:/it./”时也会返回结果。
我希望这有帮助!
推荐阅读
- node.js - 如何在角度 10 中调用 api
- ag-grid - ag-grid 将 cellRendererParams 设置为行数据中的值
- r - 根据计算有条件地重新标记行
- asp.net - 通过 Excel/Word 中的超链接打开的选项卡无法识别会话 cookie
- ios - 在 Swift 中调用的第一个函数?
- python-3.x - 无法附加到数组元素
- tensorflow - 使用 Tensorflow v2.3.0 的“计算能力”“cuda 架构”的区别说明
- angular - Angular:材料表转换材料表的API数组
- wagtail - 如何在 Wagtail 中为多个页面类创建标签字段?
- php - 未找到 Laravel 8 类“Laravel\Fortify\Actions\Auth”