首页 > 解决方案 > mongo:使用正则表达式进行文本搜索

问题描述

我有一个test用以下数据命名的集合:

> db.test.find()
{ "_id" : ObjectId("5ae3494a5daab479a87f51fb"), "a" : "a6", "b" : "b6", "c" : "c6", "__key" : "default-domain:admin:vn1;c8" }
{ "_id" : ObjectId("5ae349645daab479a87f51fc"), "a" : "a7", "b" : "b7", "c" : "c7", "__key" : "default-domain:admin:vn2;c9" }
{ "_id" : ObjectId("5ae349af5daab479a87f51fd"), "a" : "a0", "b" : "b0", "c" : "c0", "__key" : "a0;b0;c0" }
{ "_id" : ObjectId("5ae349be5daab479a87f51fe"), "a" : "a1", "b" : "b1", "c" : "c1", "__key" : "a1;b1;c1" }
{ "_id" : ObjectId("5ae349cc5daab479a87f51ff"), "a" : "a2", "b" : "b1", "c" : "c2", "__key" : "a2;b2;c2" }
{ "_id" : ObjectId("5ae349d75daab479a87f5200"), "a" : "a3", "b" : "b2", "c" : "c3", "__key" : "a3;b3;c3" }
{ "_id" : ObjectId("5ae34b6c5daab479a87f5201"), "a" : "a8", "b" : "b8", "c" : "c9", "__key" : "default-domain:vn9;ch9" }
> 

我设置的索引如下:

db.test.createIndex({__key: "text"})

现在,我想搜索带有键的字符串default-domain:*c8

> db.test.find({$text: {$search: "/default-domain:*c8/"}})
{ "_id" : ObjectId("5ae3494a5daab479a87f51fb"), "a" : "a6", "b" : "b6", "c" : "c6", "__key" : "default-domain:admin:vn1;c8" }
{ "_id" : ObjectId("5ae34b6c5daab479a87f5201"), "a" : "a8", "b" : "b8", "c" : "c9", "__key" : "default-domain:vn9;ch9" }
{ "_id" : ObjectId("5ae349645daab479a87f51fc"), "a" : "a7", "b" : "b7", "c" : "c7", "__key" : "default-domain:admin:vn2;c9" }
> 

所以它返回错误的数据,我只希望返回

{ "_id" : ObjectId("5ae3494a5daab479a87f51fb"), "a" : "a6", "b" : "b6", "c" : "c6", "__key" : "default-domain:admin:vn1;c8" }

我从解释()中看到

    "winningPlan" : {
        "stage" : "TEXT",
        "indexPrefix" : {

        },
        "indexName" : "__key_text",
        "parsedTextQuery" : {
            "terms" : [
                "c8",
                "default",
                "domain"
            ],
            "negatedTerms" : [ ],
            "phrases" : [ ],
            "negatedPhrases" : [ ]
        },

所以在这里,它在内部被转换为 3 个单词:

            "terms" : [
                "c8",
                "default",
                "domain"
            ],

我认为这就是它返回错误数据的原因。

那么,我如何使用基于文本的索引来实现这一点:db.test.find({$text: {$search: "??"}}) 搜索表达式是否错误?

关于,-M-

标签: regexmongodbmongodb-query

解决方案


文本索引的行为符合预期,因为它对索引中的术语进行了标记和词干化。这解释了为什么在您的解释计划中搜索词被分成三个单独的词。

有关标记化,请参阅https://docs.mongodb.com/manual/core/index-text/#tokenization-delimiters ,对于令牌化,请参阅https://docs.mongodb.com/manual/core/index-text/#index-entries词干和停用词。

如果您在查询“c8”时要求“默认域”在开头,那么您可能希望考虑区分大小写的前缀表达式https://docs.mongodb.com/manual/reference/operator/query/ regex/#index-use和使用“$”在您的正则表达式http://grainge.org/pages/authoring/regex/regular_expressions.htm末尾捕获“c8” 。

或者,您可以解析“_key”字段中的值以存储相关数据并直接查询必要的值。


推荐阅读