首页 > 解决方案 > 为 Azure 搜索选择正确的分析器

问题描述

我们在 Azure 搜索服务中创建了如下索引:

"analyzers": [
{
    "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
    "name": "SWMLuceneAlongWithCustomHyphenAnalyser",
    "tokenizer": "keyword_v2",
    "tokenFilters": [
        "lowercase"
    ],
    "charFilters": []
}

该分析器被分配给一个名为“lowerMachineTag”的属性。现在,当我们使用以下查询进行搜索时,我们会得到预期的结果:

询问:search=lowerSystemID:/.*it\'s.*/lowerMachineTag:/.*it\'s.*/&$filter=(systemID%20ne%20null)%20and%20(ownerSalesforceRecordID%20eq%20'a0h5B000000gJKfQAM')&$count=true&$top=100&$skip=0

结果:

{
    "@odata.context": "https://abcd/indexes('orders-index')/$metadata#docs",
    "@odata.count": 4,
    "value": [
        {
            "@search.score": 0.1862714,
            "systemID": "*1QXEDL8E2V8MGBY",
            "machineTag": "It's me",
            "systemIDMachineTag": "*1QXEDL8E2V8MGBY|It's me",
            "machineTagSystemID": "It's me|*1QXEDL8E2V8MGBY",
            "lowerMachineTag": "it's me",
            "lowerSystemID": "*1qxedl8e2v8mgby",
            "ownerSalesforceRecordID": "a0h5B000000gJKfQAM",
            "parentSalesforceRecordID": "a0h5B000000gJKfQAM"
        },
        {
            "@search.score": 0.16417237,
            "systemID": "*1QXEDL8E2V8MGBY",
            "machineTag": "It's me",
            "systemIDMachineTag": "*1QXEDL8E2V8MGBY|It's me",
            "machineTagSystemID": "It's me|*1QXEDL8E2V8MGBY",
            "lowerMachineTag": "it's me",
            "lowerSystemID": "*1qxedl8e2v8mgby",
            "ownerSalesforceRecordID": "a0h5B000000gJKfQAM",
            "parentSalesforceRecordID": "a0h5B000000gJKfQAM"
        },
        {
            "@search.score": 0.16417237,
            "systemID": "*1QXEDL8E2V8MGBY",
            "machineTag": "It's me",
            "systemIDMachineTag": "*1QXEDL8E2V8MGBY|It's me",
            "machineTagSystemID": "It's me|*1QXEDL8E2V8MGBY",
            "lowerMachineTag": "it's me",
            "lowerSystemID": "*1qxedl8e2v8mgby",
            "ownerSalesforceRecordID": "a0h5B000000gJKfQAM",
            "parentSalesforceRecordID": "a0h5B000000gJKfQAM"
        },
        {
            "@search.score": 0.16417237,
            "systemID": "*1QXEDL8E2V8MGBY",
            "machineTag": "It's me",
            "systemIDMachineTag": "*1QXEDL8E2V8MGBY|It's me",
            "machineTagSystemID": "It's me|*1QXEDL8E2V8MGBY",
            "lowerMachineTag": "it's me",
            "lowerSystemID": "*1qxedl8e2v8mgby",
            "ownerSalesforceRecordID": "a0h5B000000gJKfQAM",
            "parentSalesforceRecordID": "a0h5B000000gJKfQAM"
        }
    ]
}

但是,如果我们在搜索 lowerMachineTag:/ 时也应该返回结果,那么分析器配置的一般建议是什么。它。/ 除了上述行为

标签: azure-cognitive-search

解决方案


看来您在搜索查询中使用了正则表达式——要使其正常工作,您还必须在查询字符串中添加“<strong>&queryType=full”。否则,整个搜索词(“<em>lowerSystemID:/.*it\'s.*/lowerMachineTag:/.*it\'s.*/”)将被理解为一个简单的查询,这意味着它会被分析使用标准分析器并匹配任何可搜索的字段。通过添加“<strong>&queryType=full”,您的正则表达式将被理解为仅与指定字段匹配。

根据您的问题,如果指定了“<strong>lowerMachineTag:/.it./”,它将与上述四个文档中的任何一个都不匹配,如“。” 在正则表达式的开头将尝试匹配“it”字符之前的字符,并且至少在四个文档中,“lowerMachineTag”的值始终以“it”开头。

如果要删除开头的“。” 字符,仅使用“<strong>lowerMachineTag:/it./”,它仍然不匹配,因为正则表达式必须匹配整个标记(添加 '<em>' 会起作用:“lowerMachineTag:/it./”) .

您还可以使用nGram_v2 token filter更改分析器定义以使“/it./”工作,如下所示:

"analyzers": [
{
    "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
    "name": "SWMLuceneAlongWithCustomHyphenAnalyser",
    "tokenizer": "keyword_v2",
    "tokenFilters": [
        "lowercase", “myNGramTokenFilter”
    ],
    "charFilters": []
},
"tokenFilters":[  
   {  
      "name":"myNGramTokenFilter",  
      "@odata.type":"Microsoft.Azure.Search.NGramTokenFilterV2",  
      "minGram":1,  
      "maxGram":100
   }  
]

这仍然会使您的原始查询(+“queryType=full”)返回相同的结果,并且在使用“lowerMachineTag:/it./”时也会返回结果。

我希望这有帮助!


推荐阅读