首页 > 解决方案 > 通配符、未过滤的搜索问题 - MarkLogic

问题描述

MarkLogic 版本:8.0-6.3

让我用例子来解释这个问题。

在 DB 中插入以下文档:

xdmp:document-insert('/sample/1.xml', <data>Türkiye Araştırmaları Literatür Dergisi</data>);
xdmp:document-insert('/sample/2.xml', <data>Türk-İslâm Medeniyeti Akademik Araştırmalar Dergisi/Journal of the Academic Studies of Turkish-Islamic Civilization</data>);
xdmp:document-insert('/sample/3.xml', <data>Österreich in Geschichte und Literatur (mit Geographie)</data>);
xdmp:document-insert('/sample/4.xml', <data>Uluslararası Karadeniz Havzası Halk Bilimi Araştırmaları Dergisi</data>);
xdmp:document-insert('/sample/5.xml', <data>Süleyman Demirel Üniversitesi Fen-Edebiyat Fakültesi Sosyal Bilimler Dergisi</data>);
xdmp:document-insert('/sample/6.xml', <data>Tarih İncelemeleri Dergisi</data>);
xdmp:document-insert('/sample/7.xml', <data>Literatur und Kritik</data>);
xdmp:document-insert('/sample/8.xml', <data>Cumhuriyet Tarihi Araştırmaları Dergisi</data>);
xdmp:document-insert('/sample/9.xml', <data>Divan Edebiyatı Araştırmaları Dergisi/The Journal of Ottoman Literature Studies</data>);
xdmp:document-insert('/sample/10.xml', <data>Krieg und Literatur/War and Literature</data>);
xdmp:document-insert('/sample/11.xml', <data>Trakya Üniversitesi Edebiyat Fakültesi Dergisi</data>);
xdmp:document-insert('/sample/12.xml', <data>Jahrbuch zur Kultur und Literatur der Weimarer Republik</data>);

cts查询:

cts:search(
      doc(),
      cts:element-word-query(
          xs:QName('data'), 
          "Türk?ye Arast?rmalar? L?teratür Derg?s?",
          ("case-insensitive","diacritic-insensitive","punctuation-insensitive","stemmed","wildcarded","lang=en")
       ),
      'unfiltered'
)

输出:

返回所有上述插入的文档。

预期输出:

应该单独返回/sample/1.xmldoc。

数据库配置:

<config>
    <name>content</name>
    <package-database-properties>
        <enabled>true</enabled>
        <retired-forest-count>0</retired-forest-count>
        <language>en</language>
        <stemmed-searches>advanced</stemmed-searches>
        <word-searches>true</word-searches>
        <word-positions>true</word-positions>
        <fast-phrase-searches>true</fast-phrase-searches>
        <fast-reverse-searches>false</fast-reverse-searches>
        <triple-index>false</triple-index>
        <triple-positions>false</triple-positions>
        <fast-case-sensitive-searches>true</fast-case-sensitive-searches>
        <fast-diacritic-sensitive-searches>true</fast-diacritic-sensitive-searches>
        <fast-element-word-searches>true</fast-element-word-searches>
        <element-word-positions>true</element-word-positions>
        <fast-element-phrase-searches>true</fast-element-phrase-searches>
        <element-value-positions>true</element-value-positions>
        <attribute-value-positions>true</attribute-value-positions>
        <field-value-searches>true</field-value-searches>
        <field-value-positions>true</field-value-positions>
        <three-character-searches>true</three-character-searches>
        <three-character-word-positions>true</three-character-word-positions>
        <fast-element-character-searches>true</fast-element-character-searches>
        <trailing-wildcard-searches>true</trailing-wildcard-searches>
        <trailing-wildcard-word-positions>true</trailing-wildcard-word-positions>
        <fast-element-trailing-wildcard-searches>true</fast-element-trailing-wildcard-searches>
        <word-lexicons>
            <word-lexicon>http://marklogic.com/collation/codepoint</word-lexicon>
        </word-lexicons>
        <two-character-searches>false</two-character-searches>
        <one-character-searches>false</one-character-searches>
        <uri-lexicon>true</uri-lexicon>
        <collection-lexicon>true</collection-lexicon>
        <reindexer-enable>true</reindexer-enable>
        <reindexer-throttle>5</reindexer-throttle>
        <reindexer-timestamp>0</reindexer-timestamp>
        <directory-creation>manual</directory-creation>
        <maintain-last-modified>false</maintain-last-modified>
        <maintain-directory-last-modified>false</maintain-directory-last-modified>
        <inherit-permissions>false</inherit-permissions>
        <inherit-collections>false</inherit-collections>
        <inherit-quality>false</inherit-quality>
        <in-memory-limit>262144</in-memory-limit>
        <in-memory-list-size>512</in-memory-list-size>
        <in-memory-tree-size>128</in-memory-tree-size>
        <in-memory-range-index-size>16</in-memory-range-index-size>
        <in-memory-reverse-index-size>16</in-memory-reverse-index-size>
        <in-memory-triple-index-size>64</in-memory-triple-index-size>
        <large-size-threshold>1024</large-size-threshold>
        <locking>fast</locking>
        <journaling>fast</journaling>
        <journal-size>2047</journal-size>
        <journal-count>2</journal-count>
        <preallocate-journals>false</preallocate-journals>
        <preload-mapped-data>false</preload-mapped-data>
        <preload-replica-mapped-data>false</preload-replica-mapped-data>
        <range-index-optimize>facet-time</range-index-optimize>
        <positions-list-max-size>256</positions-list-max-size>
        <format-compatibility>automatic</format-compatibility>
        <index-detection>automatic</index-detection>
        <expunge-locks>none</expunge-locks>
        <tf-normalization>scaled-log</tf-normalization>
        <merge-priority>lower</merge-priority>
        <merge-max-size>49152</merge-max-size>
        <merge-min-size>1024</merge-min-size>
        <merge-min-ratio>1</merge-min-ratio>
        <merge-timestamp>0</merge-timestamp>
        <retain-until-backup>false</retain-until-backup>
        <rebalancer-enable>true</rebalancer-enable>
        <rebalancer-throttle>5</rebalancer-throttle>
        <assignment-policy>
            <assignment-policy-name>bucket</assignment-policy-name>
        </assignment-policy>
    </package-database-properties>
    <links>
        <forests-list>
            <forest-name>r-f4</forest-name>
            <forest-name>r-f3</forest-name>
            <forest-name>r-f2</forest-name>
            <forest-name>r-f1</forest-name>
        </forests-list>
        <security-database>Security</security-database>
        <schema-database>Schemas</schema-database>
        <triggers-database>Triggers</triggers-database>
    </links>
</config>

我无法理解出了什么问题。为什么我得到错误的输出。

似乎,如果在data元素中甚至存在一个单词,它就会作为匹配项返回。

请帮助我理解我做错了什么。

更新:

xdmp:plan输出

标签: marklogic

解决方案


查看搜索的输出xdmp:plan。我希望不敏感的选项会以这样的方式击败通配符优化,从而使您得到一个非常弱的查询。


推荐阅读