search - 如何在 Solr 上处理阿拉伯字符
问题描述
我试图让我的网站在搜索过程中忽略一些阿拉伯字符 ex("ه"،"ة")。当用户搜索像“مدينة”这样的以“ة”结尾的词时,它只会带来以“ة”这个字符结尾的词,它应该也会带来以“ه”结尾的词,比如“مدينه”。
这些字符在阿拉伯语中是相同的,所以搜索结果不应该不同,但我的网站会产生不同的结果
我在我的 schema_extra_types.xml 上尝试了什么
<fieldType name="text_ar" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory" mapping="accents_ar.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_ar.txt"/>
<filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="1" generateNumberParts="1" protected="protwords_ar.txt" splitOnCaseChange="0" generateWordParts="1" preserveOriginal="1" catenateAll="0" catenateWords="1"/>
<filter class="solr.LengthFilterFactory" min="2" max="100"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="Arabic" protected="protwords_ar.txt"/>
<filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="25" />
<filter class="solr.ASCIIFoldingFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.MappingCharFilterFactory" mapping="accents_ar.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms_ar.txt" expand="true" ignoreCase="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_ar.txt"/>
<filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="0" generateNumberParts="1" protected="protwords_ar.txt" splitOnCaseChange="0" generateWordParts="1" preserveOriginal="1" catenateAll="0" catenateWords="0"/>
<filter class="solr.LengthFilterFactory" min="2" max="100"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="Arabic" protected="protwords_ar.txt"/>
<filter class="solr.ASCIIFoldingFilterFactory" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
但是当我从 drupal 管理界面下载配置文件夹时,accents_ar.txt 为空,我在哪里可以找到在我的网站上使用的accents_ar.txt 示例?还是有另一个过滤器类来处理这些问题?
解决方案
推荐阅读
- python - 根据给定任务编写产量生成器函数
- vb.net - 在面板 VB.Net 中自动调整 WinForm
- ios - Downloading image from Firebase and adding it to an Array
- flask - Overwrite an id of a parent element of a Jinja template
- flutter - Center row child and right-align
- c++ - Hot Air Balloon not visible
- angular - How to set a binary value for a angular checkbox based on user input?
- encoding - "-" 显示 sublime 和 IDEA 之间的差异
- php - How to delay PHP function from triggering in a jQuery Ajax Loop?
- javascript - react-native-vector-icons + mocha: Invariant Violation