java - 使用 CJKAnalyzer 进行 Apache Lucene 索引器搜索
问题描述
I am using Apache lucene Indexer Search to search text, and I am using
CJKAnalyzer. It search provided word by character, It means
If I Search for Japanese word "ぁxまn" , then its showing all
the words which is having any character of the provided Japanese word.
But I dont want this I want search whole word or the
word which is having above mentioned word.
例如,如果我索引了 3 个单词。即“ぁxまn”、“ぁxま”、“まn”
case 1 : If I search for "ぁxまn" then it should only give one result.
case 2 : If I search for "ぁx" then it should give two result.
现在就我而言,如果我搜索“ぁxまn”这个词,那么它给出的三个结果是错误的。
-------------------- 索引代码 ----------------- ----
writer = getIndexWriter();
List<Document> documents = new ArrayList<>();
Document document1 = createDocument(1, "ぁxまn", "Richard");
writer.addDocument(document1);
writer.commit();
private static Document createDocument(Integer id, String firstName, String lastName)
{
Document document = new Document();
document.add(new StringField("id", id.toString() , Field.Store.YES));
document.add(new TextField("firstName", firstName , Field.Store.YES));
document.add(new TextField("lastName", lastName , Field.Store.YES));
document.add(new TextField("website", website , Field.Store.YES));
return document;
}
private static IndexWriter createWriter() throws IOException
{
FSDirectory dir = FSDirectory.open(Paths.get(INDEX_DIR).toFile());
IndexWriterConfig config = new
IndexWriterConfig(Version.LUCENE_44,new CJKAnalyzer());
IndexWriter writer = new IndexWriter(dir, config);
return writer;
}
--------调用搜索 ------
TopDocs foundDocs2 = searchByFirstName("*ぁxまn*", searcher);
-------------------------------------------------------------
private static TopDocs searchByFirstName(String firstName, IndexSearcher searcher) throws Exception
{
MultiFieldQueryParser mqp = new MultiFieldQueryParser(new String[]{"firstName"}, new CJKAnalyzer());
mqp.setAllowLeadingWildcard(true);
Query q =mqp.parse(firstName);
TopDocs hits = searcher.search(q, 10);
return hits;
}