首页 > 解决方案 > Elasticsearch synonym issue

问题描述

I've had a look on the other questions surrounding this problem but it doesn't seem to help.

I'm having to change an input of "i phone" or "i Phone" to query "iPhone" in Elasticsearch.

As you can see, I have tried most everything I can think of, including simply "phone => iPhone" and leaving the "i" in there to hang around and possibly add it to the stopwords.

I've tried using "simple", "keyword", "standard" and "whitespace" for my custom analyzer.

Can anyone spot where I've gone wrong, this is the last problem before I can finish my project so it'd be appreciated. Thanks

P.S. Bonus points if you include how I can do auto suggest on inputs, thanks

Below is my code

public static CreateIndexDescriptor GetMasterProductDescriptor(string indexName = "shopmaster")
        {
            var indexDescriptor = new CreateIndexDescriptor(indexName)
                .Settings(s => s
                            .Analysis(a => a
                                .TokenFilters(t => t
                                    .Stop("my_stop", st => st
                                        .StopWords("_english_", "new", "cheap")
                                        .RemoveTrailing()
                                    )
                                    .Synonym("my_synonym", st => st
                                        .Synonyms(
                                            "phone => iPhone"
                                        //"i phone => iPhone",
                                        //"i Phone => iPhone"
                                        )
                                    )
                                    .Snowball("my_snowball", st => st
                                        .Language(SnowballLanguage.English)
                                    )
                                )
                                .Analyzers(an => an
                                    .Custom("my_analyzer", ca => ca
                                        .Tokenizer("simple")
                                        .Filters(
                                            "lowercase",
                                            "my_stop",
                                            "my_snowball",
                                            "my_synonym"
                                        )
                                    )
                                )
                            )
                        )
                .Mappings(
                    ms => ms.Map<MasterProduct>(
                        m => m.AutoMap()
                            .Properties(
                                ps => ps
                                    .Nested<MasterProductAttributes>(p => p.Name(n => n.MasterAttributes))
                                    .Nested<MasterProductAttributes>(p => p.Name(n => n.ProductAttributes))
                                    .Nested<MasterProductAttributeType>(p => p.Name(n => n.MasterAttributeTypes))
                                    .Nested<Feature>(p => p.Name(n => n.Features))
                                    .Nested<RelatedProduct>(p => p.Name(n => n.RelatedProducts))
                                    .Nested<MasterProductItem>(
                                        p => p.Name(
                                                n => n.Products
                                            )
                                            .Properties(prop => prop.Boolean(
                                                b => b.Name(n => n.InStock)
                                            ))
                                    )
                                    .Boolean(b => b.Name(n => n.InStock))
                                    .Number(t => t.Name(n => n.UnitsSold).Type(NumberType.Integer))
                                    .Text(
                                        tx => tx.Name(e => e.ManufacturerName)
                                            .Fields(fs => fs.Keyword(ss => ss.Name("manufacturer"))
                                                    .TokenCount(t => t.Name("MasterProductId")
                                                            .Analyzer("my_analyzer")
                                                    )
                                            )
                                            .Fielddata())
                                    //.Completion(cm=>cm.Analyzer("my_analyser")
                                    )
                    )
                );
            return indexDescriptor;
        }

标签: elasticsearchlucenenest

解决方案


过滤器的顺序很重要!

您正在应用小写字母,然后是词干分析器(雪球),然后是同义词。您的同义词包含大写字母,但在应用它们时,已经发生了小写。首先应用小写字母是个好主意,以确保大小写不会影响同义词的匹配,但在这种情况下,您的替换不应该有大写字母。

词干不应该应用在同义词之前(除非你知道你在做什么,并且正在比较词干后的词)。我相信,Snowball 会将“iphone”转换为“iphon”,因此这是您遇到麻烦的另一个领域。

"lowercase",
"my_synonym",
"my_stop",
"my_snowball",

(并且不要忘记从同义词中删除大写字母)


推荐阅读