首页 > 解决方案 > 如何在 Solr 中对嵌套文档执行 OR 文本搜索

问题描述

我在 Solr 8.5.1 中索引了一个嵌套文档结构,如下所示:

"docs": [
    {
        "id": "unmatching_parent_and_children",
        "searchtext": "bla bla bla",
        "entity_type": "parent",
        "_childDocuments_": [
            {
                "id": "unmatching_parent_and_children.child_1",
                "searchtext": "bla bla",
                "entity_type": "child_type_1"
            },
            {
                "id": "unmatching_parent_and_children.child_2",
                "searchtext": "bla bla bla",
                "entity_type": "child_type_2"
            }
        ]
    },
    {
        "id": "matching_parent_unmatching_children",
        "searchtext": "bla searchterm bla bla",
        "entity_type": "parent",
        "_childDocuments_": [
            {
                "id": "matching_parent_unmatching_children.child_1",
                "searchtext": "bla bla",
                "entity_type": "child_type_1"
            },
            {
                "id": "matching_parent_unmatching_children.child_2",
                "searchtext": "bla bla bla",
                "entity_type": "child_type_2"
            }
        ]
    },
    {
        "id": "unmatching_parent_matching_child_1",
        "searchtext": "bla bla bla",
        "entity_type": "parent",
        "_childDocuments_": [
            {
                "id": "unmatching_parent_matching_child_1.child_1",
                "searchtext": "bla searchterm bla",
                "entity_type": "child_type_1"
            },
            {
                "id": "unmatching_parent_matching_child_1.child_2",
                "searchtext": "bla bla bla",
                "entity_type": "child_type_2"
            }
        ]
    },
    {
        "id": "unmatching_parent_matching_child_2",
        "searchtext": "bla bla bla",
        "entity_type": "parent",
        "_childDocuments_": [
            {
                "id": "unmatching_parent_matching_child_2.child_1",
                "searchtext": "bla bla",
                "entity_type": "child_type_1"
            },
            {
                "id": "unmatching_parent_matching_child_2.child_2",
                "searchtext": "bla bla searchterm bla",
                "entity_type": "child_type_2"
            }
        ]
    }
]

我正在寻找一个查询,该查询searchtext在所有父文档和子文档中执行文本搜索,并且匹配具有匹配的searchtext父母或具有匹配的孩子的searchtext父母,或者具有匹配的父母和孩子searchtext

像这样的东西(这是伪代码):

q=(entity_type:parent AND searchtext:searchterm) 
    OR ({!parent which="entity_type:parent"}(-entity_type:parent AND +searchtext:searchterm))
fl=id,[child parentFilter="entity_type:parent"]

预期结果:

"docs": [
    {
        "id": "matching_parent_unmatching_children",
        "_childDocuments_": [
            {
                "id": "matching_parent_unmatching_children.child_1",
            },
            {
                "id": "matching_parent_unmatching_children.child_2",
            }
        ]
    },
    {
        "id": "unmatching_parent_matching_child_1",
        "_childDocuments_": [
            {
                "id": "unmatching_parent_matching_child_1.child_1",
            },
            {
                "id": "unmatching_parent_matching_child_1.child_2",
            }
        ]
    },
    {
        "id": "unmatching_parent_matching_child_2",
        "_childDocuments_": [
            {
                "id": "unmatching_parent_matching_child_2.child_1",
            },
            {
                "id": "unmatching_parent_matching_child_2.child_2",
            }
        ]
    }
]

到目前为止,我没有成功构建满足此要求的 Solr 查询。查询生成解析错误,或者将其解释为纯搜索文本而不尊重其中的表达式,或者仅匹配父和子都匹配的文档结构searchtext。我尝试过的查询解析器(在几种组合中)是 Lucene、eDisMax/DisMax、Block Join Parent 和 Simple。

标签: solrnested

解决方案


我用我自己的数据集尝试了你的查询,也得到了一些解析错误:

"Parent query must not match any docs besides parent filter. Combine them as must (+) and must-not (-) clauses to find a problem doc. docID=0"

我不明白为什么会发生这种解析错误。

但是当我将子查询-entity_type:parent AND +searchtext:searchterm移动到{!parent} 查询的过滤器参数中时,它对我有用:

q=(entity_type:parent AND searchtext:searchterm) OR ({!parent which="entity_type:parent" filters="-entity_type:parent AND +searchtext:searchterm"})

fl=id,[child parentFilter="entity_type:parent"]

它应该返回相同的结果。另请参阅https://lucene.apache.org/solr/guide/8_5/other-parsers.html#filtering-and-tagging-2

我希望这对你也有帮助。


推荐阅读