首页 > 解决方案 > 如何组合 Solr 嵌套文档中的父子节点

问题描述

我是 lucence、solr 查询的新手,我对如何查询嵌套文档有疑问。

我有嵌套的文档索引,如下

[
  {
    "id": "1",
    "title": "Solr1",
    "_childDocuments_": [
      {
        "id": "2",
        "title": "Solr2",
        "_childDocuments_": [
          {
            "id": "3",
            "title": "Solr3",
            "_childDocuments_": [
              {
                "id": "4",
                "title": "SolrCloud supports it"
              }
            ],
            "something_else":"irrelevant"
          }
        ],
        "something_else":"irrelevant"
      }
    ],
    "something_else":"irrelevant"
  },
  {
    "id": "5",
    "title": Solr5",
    "_childDocuments_": [
      {
        "id": "6",
        "title": "SolrCloud here as well"
      }
    ]
  }
]

我如何搜索标题:SolrCloud,并列出所有孩子的父母?如

[
  {
    "id": "1",
    "title": "Solr1",
    "_childDocuments_": [
      {
        "id": "2",
        "title": "Solr2",
        "_childDocuments_": [
          {
            "id": "3",
            "title": "Solr3",
            "_childDocuments_": [
              {
                "id": "4",
                "title": "SolrCloud supports it"
              }
            ]
          }
        ]
      }
    ]
  },
  {
    "id": "5",
    "title": Solr5",
    "_childDocuments_": [
      {
        "id": "6",
        "title": "SolrCloud here as well"
      }
    ]
  }
]

其中列出了文档 4(Sorl1、Solr2、Solr3)和文档 6(Solr5)的所有父项。并且文档的深度不是常数。

标签: solrluceneedismaxdismax

解决方案


我目前的解决方案是按摩数据,在原始数据中添加跟踪,这样我就会知道文件来自哪里。如

[
  {
    "id": "1",
    "title": "Solr1",
    "_childDocuments_": [
      {
        "id": "2",
        "title": "Solr2",
        **"parent_id":"1",**
        **"trace":"Solr1",**
        "_childDocuments_": [
          {
            "id": "3",
            "title": "Solr3",
            **"parent_id":"2",**
            **"trace":"Solr1/Solr2",**
            "_childDocuments_": [
              {
                "id": "4",
                "title": "SolrCloud supports it"
                **"parent_id":"3",**
                **"trace":"Solr1/Solr2/Solr3",**
              }
            ],
            "something_else":"irrelevant"
          }
        ],
        "something_else":"irrelevant"
      }
    ],
    "something_else":"irrelevant"
  },
  {
    "id": "5",
    "title": Solr5",
    "_childDocuments_": [
      {
        "id": "6",
        **"parent_id":"5",**
        **"trace":"Solr5",**
        "title": "SolrCloud here as well"
      }
    ]
  }
]

所以索引后,我可以从结果中知道谁是父文档。

有人可以同意吗?寻找比这更好的解决方案。


推荐阅读