首页 > 解决方案 > How to implment SpanQuery with MultiFieldQuery in java using lucene

问题描述

I currently want to implement SpanQuery with MultiFieldQuery for fuzzy phrasing but I have issues with it.

I have tried using MultiFieldQuery with BooleanQuery. It only works partially, it can search fuzzy phrase but the phrase does not follow any slop, for example, my index contains this "Check out these". When I search "Check out", it will return a hit and show this "Check out these". This is the result I want. However, when I search "Check these", it will also return a hit and show this "Check out these". In this case, it should fail because "out" is the second word.

I have also tried using SpanQuery. The above scenario will not happen if I using this method. However, I can only search for one field. Whereas I want to search with multiple fields.

private static TopDocs searchInFuzzyPhrase(String textToFind, String textToFind1, IndexSearcher searcher, int slop)
        throws Exception {
    // Create search query in phrase    
    Analyzer analyzer = new StandardAnalyzer();
    //multifield
    MultiFieldQueryParser query = new MultiFieldQueryParser(new String[]
    { "FULL_NAME", "BRAND_NAME", "DISPLAY_NAME", "DISPLAY_NAME_SYNONYM" }, analyzer);
    query.setPhraseSlop(slop);
    BooleanQuery bQuery = new BooleanQuery.Builder()
            .add(query.parse(textToFind + "~"), BooleanClause.Occur.MUST)
            .add(query.parse(textToFind1 + "~"), BooleanClause.Occur.MUST)
            .build();       
    //span
    SpanQuery[] clauses = new SpanQuery[2];
    clauses[0] = new SpanMultiTermQueryWrapper(new FuzzyQuery(new Term("DISPLAY_NAME", textToFind)));
    clauses[1] = new SpanMultiTermQueryWrapper(new FuzzyQuery(new Term("DISPLAY_NAME", textToFind1)));
    SpanNearQuery sQuery = new SpanNearQuery(clauses, slop, true);

    TopDocs hits = searcher.search(bQuery, 1);
    return hits;
}

Using the example earlier. "Check out these" When I search "Check these" using MultiField + BooleanQuery, it will return a hit, however, it is not what I want.

When I search "Check these using SpanQuery, it will return a miss. This is what I want partially but it only applies to one field. I'm trying to apply it with many fields

标签: javalucenefuzzy-search

解决方案


这里的问题是,跨度只适用于一个领域。这是可以理解的,因为不同领域之间几乎没有位置的概念。

您需要遵循您拥有的相同代码,只需将其扩展到您拥有的所有字段列表。

例如,对于列表中的每个字符串,您"FULL_NAME", "BRAND_NAME", "DISPLAY_NAME", "DISPLAY_NAME_SYNONYM"需要SpanQuery像在示例中那样创建,然后将它们全部合并为BooleanQuery一个Occur.SHOULD


推荐阅读