首页 > 解决方案 > 如何在弹性搜索中进行子字符串搜索?

问题描述

我正在尝试进行弹性搜索来进行子字符串搜索。

response = es.search(index='salary_fulltime', body={
        'query':{
            'bool':{
                'must':[{
                        'match_phrase':{
                            'title':'sr. java developer'
                        }
                    },{
                        'match_phrase':{
                            'location':'holtsville'
                        }
                    }]
            }
        }
    })

在我的数据库中,我有类似的标题,

Senior Java Developer, Java Developer, Java Engineer

但是没有诸如此类的例子sr. java developer

有没有办法可以进行子字符串匹配。即使Sr.我的弹性搜索索引中没有,有没有办法sr. java developer与我们数据库中的内容相匹配,例如Senior Java Developer, Java Developer, Java Engineer.

目前我的搜索不匹配任何内容。

[{'_id': '484',
 '_index': 'data',
 '_score': 13.8527,
 '_source': {'title': 'Java Developer / Engineer'},
 '_type': '_doc'},
{'_id': '385',
 '_index': 'data',
 '_score': 12.527,
 '_source': {'title': 'Senior Java Developer / Engineer'},
 '_type': '_doc'},
{'_id': '433',
 '_index': 'data',
 '_score': 11.828527,
 '_source': {'title': 'Java Architect'},
 '_type': '_doc'}]

标签: pythonpython-3.xelasticsearch

解决方案


假设该title字段是text数据类型。因此,如果没有为数据类型字段定义分析器text,则 elasticsearch 使用标准分析器。这将标记"Senior Java Developer"

{
  "tokens": [
    {
      "token": "senior",
      "start_offset": 0,
      "end_offset": 6,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "java",
      "start_offset": 7,
      "end_offset": 11,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "developer",
      "start_offset": 12,
      "end_offset": 21,
      "type": "<ALPHANUM>",
      "position": 2
    }
  ]
}

在您搜索 时的搜索查询中sr. java developer,这再次被标记为sr, java, developer。此查询将匹配具有上述任何标记的任何文档。

您可以简单地使用match查询而不是匹配短语查询

{
  "query": {
    "match": {
      "title": "sr. java developer"
    }
  }
}

搜索结果将是

"hits": [
      {
        "_index": "67660379",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.6409958,
        "_source": {
          "title": "Java Developer"
        }
      },
      {
        "_index": "67660379",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.5403744,
        "_source": {
          "title": "Senior Java Developer"
        }
      },
      {
        "_index": "67660379",
        "_type": "_doc",
        "_id": "3",
        "_score": 0.14181954,
        "_source": {
          "title": "Java Engineer"
        }
      }
    ]

更新1:

您可以将minimum_should_match参数与匹配查询一起使用

{
  "query": {
    "match": {
      "title": {
        "query": "sr. java developer",
        "minimum_should_match": "75%"
      }
    }
  }
}

搜索结果将是

"hits": [
      {
        "_index": "67660379",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.6409958,
        "_source": {
          "title": "Java Developer"
        }
      },
      {
        "_index": "67660379",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.5403744,
        "_source": {
          "title": "Senior Java Developer"
        }
      }
    ]

推荐阅读