首页 > 解决方案 > 当段落包含来自 Elasticsearch 索引的句子时匹配

问题描述

我使用elasticsearch创建一个程序,允许查找文本中引用圣经的所有地方以及提到的经文的地方我在elasticsearch中索引了圣经的所有经文,每节经文都是一个文档当我这样做时通过部分输入经文进行搜索,我找到了正确的结果(即使是犯了错误)并容忍错误(使用模糊参数或使用我认为的同义词)

我的索引示例:

{"index":{"_index":"test","_type":"","_id":1}}
{"fields":{"year":3560,"book":"1","chapter":1,"section":1,"text":"others words consectetur adipiscing and others words"},"id":"test1","type":"add"}
{"index":{"_index":"test","_type":"","_id":2}}
{"fields":{"year":3560,"book":"2","chapter":3,"section":2,"text":"others words a sagittis nisl quam and others words"},"id":"test2","type":"add"}
{"index":{"_index":"test","_type":"","_id":3}}
{"fields":{"year":3560,"book":"3","chapter":1,"section":5,"text":"others words Aliquam ultrices auctor pharetra and others words"},"id":"test3","type":"add"}
{"index":{"_index":"test","_type":"","_id":4}}
{"fields":{"year":3560,"book":"4","chapter":2,"section":4,"text":"others words Proin ut vestibulum and others words"},"id":"test4","type":"add"}
{"index":{"_index":"test","_type":"","_id":5}}
{"fields":{"year":3560,"book":"5","chapter":1,"section":5,"text":"others words Aenean pretium tincidunt aliquet and others words"},"id":"test5","type":"add"}
{"index":{"_index":"test","_type":"","_id":6}}
{"fields":{"year":3560,"book":"6","chapter":2,"section":1,"text":"others words In vitae sagittis and others words"},"id":"test6","type":"add"}
{"index":{"_index":"test","_type":"","_id":7}}
{"fields":{"year":3560,"book":"7","chapter":7,"section":7,"text":"others words ligula laoreet pharetra and others words"},"id":"test7","type":"add"}
{"index":{"_index":"test","_type":"","_id":8}}
{"fields":{"year":3560,"book":"8","chapter":1,"section":4,"text":"others words luctus eros a pretium and others words"},"id":"test8","type":"add"}
{"index":{"_index":"test","_type":"","_id":9}}
{"fields":{"year":3560,"book":"9","chapter":1,"section":7,"text":"others words ullamcorper eu id quam and others words"},"id":"test9","type":"add"}
{"index":{"_index":"test","_type":"","_id":10}}
{"fields":{"year":3560,"book":"10","chapter":5,"section":4,"text":"others words Nullam ac enim ac lacus hendrerit and others words"},"id":"test10","type":"add"}

我需要找到索引中段落中的所有出现,以恢复它们的来源:

Lorem ipsum dolor sit amet, consectetur adipiscing elit。Nulla rhoncus,nulla vitae porta euismod,purus nisl faucibus nunc,sagittis nisl quam id arcu。Sed sat amet arcu sed dui auctor bibendum。Proin ut 前庭sem,id rutrum felis。Phasellus sagittis justo sit amet justo consequat, id scelerisque eros cursus。Quisque dapibus finibus euismod。Proin dui urna, auctor ut gravida quis, fringilla quis velit。Donec sed pulvinar 狮子座。Sed pulvinar pharetra arcu nec egestas。Mauris 非 dapibus 直径。Pellentesque quis pellentesque libero。 Aliquam ultrices auctor pharetra. Cras ullamcorper、odio sit amet aliquam convallis、magna nibh gravida nunc、sit amet volutpat elit purus eget lectus。Pellentesque eu est a risus euismod consequat。Duis id erat porttitor, sodales justo non, aliquet ex。Etiam tincidunt neque ut nisi commodo auctor。Sed congue urna ac tellus scelerisque hendrerit。Mauris lobortis sed dui ut varius。Proin ac luctus felis。在 vitae sagittis erat,nec luctus sapien。Aenean pretium tincidunt aliquet . 在 enim vel ligula laoreet pharetra 的 Morbi。sed dignissim luctus eros a pretium。前庭 molestie molestie nisi, vitae scelerisque nibh bibendum nec。Donec laoreet sapien sed vehicula dictum。Nullam ac enim ac lacus hendrerit时间和履历。Quisque at leo pretium, efficitur augue vitae, congue eros。Maecenas volutpat ante nec scelerisque 前庭。Donec tristique orci erat,nec imperdiet nulla commodo ut。Nam non odio vel quam cursus ullamcorper eu id quam。Duis volutpat、nisl eu interdum mattis、augue ipsum mollis leo、eget efficitur orci augue eget leo。整数 feugiat facilisis dolor ut 车辆。Maecenas quis feugiat massa。Curabitur feugiat odio eget ligula tincidunt sodales。Donec feugiat dapibus lectus,非 maximus dui rhoncus vitae。Phasellus eget massa faucibus, tristique nibh sed, aliquet metus。

我不知道我是否足够清楚,但请随时问我是否需要更精确

我认为这个问题是由 Aho-Corasick 算法处理的,但我不知道如何将它集成到 elasticsearch

谢谢!

标签: algorithmelasticsearchsearchtextfull-text-search

解决方案


如果我能够正确理解您的问题,那么您所寻找的只是能够

“一些部分经文”:查询

并从弹性搜索中获取源文档作为响应,结果显示其中搜索到的经文(这就是突出显示的内容)

这是实现相同的最简单的查询

GET <index_name>/_search
{
 "query": {
   "match": {
     "message": "partial verse"
   }
 } ,
    "highlight" : {
        "fields" : {
            "message": {}
        }
    }
}

作为回应,你会得到这样的东西

"hits" : [
      {
        "_index" : "testSample",
        "_type" : "_doc",
        "_id" : "TkdvGXAB5bHyIJQ-QRow",
        "_score" : 0.2876821,
        "_source" : {
          "bookName" : "bible",
          "message" : "this is a good book"
        },
        "highlight" : {
          "message" : [
            "<em>this</em> is a good book"
          ]
        }
      }
    ]

响应是不言自明的,您可以在不同的部分中获得高度的结果。


推荐阅读