首页 > 解决方案 > 如何获取至少包含一个完整单词的 Azure 搜索结果?

问题描述

我正在 Azure 搜索门户中测试以下查询,但没有给我预期的结果。作为结果,我想要任何至少出现一次单词的文档algo

search=算法&queryType=full&searchMode=any

重要:MyVal搜索并具有Lucene Analyzer(西班牙语)

预期的元素结果:

{
    "@odata.context": "https://....windows.net/indexes(....)/$metadata#docs(*)",
    "value": [
        {
            "MyKey":"1",
            "MyValues":[
                {
                    "MyVal":"algo aqui"
                },
                {
                    "MyVal":"lala"
                },
            ]
        }
    ]
}

不是预期的元素结果:

{
    "@odata.context": "https://....windows.net/indexes(....)/$metadata#docs(*)",
    "value": [
        {
            "MyKey":"1",
            "MyValues":[
                {
                    "MyVal":"algoOtherStuff aqui"
                },
                {
                    "MyVal":"lala"
                },
            ]
        }
    ]
}

结果得到:

{
    "@odata.context": "https://....windows.net/indexes(....)/$metadata#docs(*)",
    "value": []
}

更多示例查询和结果

搜索=算法* &queryType=full&searchMode=any

[没有结果]


搜索=/。算法./&queryType=full&searchMode=any

[没有结果]


search=算法 aqui &queryType=full&searchMode=any

[预期结果!!!](找到元素)


search= aqui &queryType=full&searchMode=any

[预期结果!!!](找到元素)


重要提示:如果我更改其他两个单词以进行测试,例如:“一些数据”或“一些特殊的东西”并按其中之一进行搜索,Azure 搜索将返回预期结果。似乎是“算法”特定词的问题。

标签: azure-cognitive-search

解决方案


好的,我能够使用以下代码重现该问题:

var client = new SearchServiceClient("xxxx", new SearchCredentials("abcabc"));

            client.Indexes.Create(new Microsoft.Azure.Search.Models.Index
            {
                Name = "index",
                Fields = new List<Field>
                {
                    new Field("Id", DataType.String){ IsKey = true, IsRetrievable = true, IsFilterable = true},
                    Field.NewComplex("MyValues", true, new List<Field> { new Field("MyVal", DataType.String)
                        {
                            IsRetrievable = true,
                            IsFilterable = true,
                            IsSearchable =true,
                            Analyzer = AnalyzerName.EsLucene
                        }
                    })
                }
            });

            var docs = new List<CustomDoc> {
                new CustomDoc { Id = "1", MyValues = new MyValues[] { new MyValues { MyVal = "algo aqui" }, new MyValues { MyVal = "lala" }} },
                new CustomDoc { Id = "2", MyValues = new MyValues[] { new MyValues { MyVal = "something else" }, new MyValues { MyVal = "xxx" }} },
            };

            var indexClient = client.Indexes.GetClient("index");
            indexClient.Documents.Index(IndexBatch.Upload(docs));

是的,你是对的。“Algo”在 StandardLucene 分析器(西班牙语)中被视为停用词:

https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball/spanish_stop.txt

更改为 EsMicrosoft 分析器会返回搜索“algo”:

client.Indexes.Create(new Microsoft.Azure.Search.Models.Index
            {
                Name = "index",
                Fields = new List<Field>
                {
                    new Field("Id", DataType.String){ IsKey = true, IsRetrievable = true, IsFilterable = true},
                    Field.NewComplex("MyValues", true, new List<Field> { new Field("MyVal", DataType.String)
                        {
                            IsRetrievable = true,
                            IsFilterable = true,
                            IsSearchable =true,
                            Analyzer = AnalyzerName.EsMicrosoft
                        }
                    })
                }
            });

推荐阅读