首页 > 解决方案 > 如何在 Pylucene 8.6.1 中创建自定义分析器?

问题描述

我看过这个这个这个,但我不确定为什么它们不适合我。

我通常会使用如下分析仪。

import lucene
from org.apache.lucene.analysis.core import WhitespaceAnalyzer
from org.apache.lucene.index import IndexWriterConfig, IndexWriter
from org.apache.lucene.store import SimpleFSDirectory
from java.nio.file import Paths
from org.apache.lucene.document import Document, Field, TextField

index_path = "./index"

lucene.initVM()

analyzer =  WhitespaceAnalyzer()
config = IndexWriterConfig(analyzer)
store = SimpleFSDirectory(Paths.get(index_path))
writer = IndexWriter(store, config)

doc = Document()
doc.add(Field("title", "The quick brown fox.",  TextField.TYPE_STORED))
writer.addDocument(doc)

writer.close()
store.close()

而不是WhitespaceAnalyzer()我想使用MyAnalyzer()which should have LowerCaseFilterand WhitespaceTokenizer

from org.apache.lucene.analysis.core import LowerCaseFilter, WhitespaceTokenizer
from org.apache.pylucene.analysis import PythonAnalyzer

class MyAnalyzer(PythonAnalyzer):
    def __init__(self):
        PythonAnalyzer.__init__(self)

    def createComponents(self, fieldName):
        # What do I write here?

你能帮我写和使用MyAnalyzer()吗?

标签: pythonsearchlucenefull-text-searchpylucene

解决方案


我在这里这里发现以下工作。

from org.apache.lucene.analysis.core import LowerCaseFilter, WhitespaceTokenizer
from org.apache.pylucene.analysis import PythonAnalyzer
from org.apache.lucene.analysis import Analyzer

class MyAnalyzer(PythonAnalyzer):
    def __init__(self):
        PythonAnalyzer.__init__(self)

    def createComponents(self, fieldName):
        source = WhitespaceTokenizer()
        result = LowerCaseFilter(source)
        return Analyzer.TokenStreamComponents(source, result)

如果有人能指出我正确的方向以便能够正确找到这些答案,那就太好了。


推荐阅读