elasticsearch - ElasticSearch - Unable To Search Using Fuzzy Match Query For Underscore in value (ES Fuzzy not matching underscore value)
问题描述
Suppose I have three documents in my elasticsearch. For Ex:
1: {
"name": "test_2602"
}
2: {
"name": "test-2602"
}
3: {
"name": "test 2602"
}
Now when I search it using fuzzy match query as given below
{
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"match": {
"name": {
"query": "test-2602",
"fuzziness": "2",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"boost": 1
}
}
}
],
"disable_coord": false,
"adjust_pure_negative": true,
"boost": 1
}
}
],
"disable_coord": false,
"adjust_pure_negative": true,
"boost": 1
}
}
}
In response I am only getting two documents which is (even if I search by name value as => "test", "test 2602" or "test-2602")
{
"name": "test-2602"
},
{
"name": "test 2602"
}
I am not getting document with name as "test_2602" (not matching with value which contains underscore). I want it to include third document as well with name value as "test_2602". But If I search for name as "test_2602" then in response I get
{
"name": "test_2602"
}
I need to fetch all three documents whenever I search name as "test", "test 2602", "test-2602" and "test_2602"
解决方案
You are getting only two documents in your search because by default elasticsearch uses a standard analyzer, which will tokenize "test-2602"
and "test 2602"
into test
and 2602
. But "test_2602"
will not be tokenized.
You can check the tokens generated by using analyze API
GET /_analyze
{
"analyzer" : "standard",
"text" : "test_2602"
}
The token generated will be
{
"tokens": [
{
"token": "test_2602",
"start_offset": 0,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 0
}
]
}
You need to add .keyword to the type field. This uses the keyword analyzer instead of the standard analyzer (notice the ".keyword" after name field). Try out this below query -
Index Mapping:
{
"mappings": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
Search Query:
{
"query": {
"match": {
"name.keyword": {
"query": "test_2602",
"fuzziness":2
}
}
}
}
Search Result:
"hits": [
{
"_index": "66572330",
"_type": "_doc",
"_id": "1",
"_score": 0.9808291,
"_source": {
"name": "test_2602"
}
},
{
"_index": "66572330",
"_type": "_doc",
"_id": "3",
"_score": 0.8718481,
"_source": {
"name": "test 2602"
}
},
{
"_index": "66572330",
"_type": "_doc",
"_id": "2",
"_score": 0.8718481,
"_source": {
"name": "test-2602"
}
}
]
推荐阅读
- r - 用多层读取 JSON
- excel - 如何检查每张纸是否有列出的名称
- kotlin - SQL Room 迁移 Android / Sealed 类
- cmake - 指定 BOOST_ROOT 时 find_package (boost) 的问题
- javascript - JS 只处理一张卡而不是所有卡
- python - 在异步 fastapi“启动”函数中实例化一个类并从其他模块导入它
- android - Jetpack Compose & Navigation:问题在嵌套图中共享 ViewModel
- angular - 如何声明一个角度材料模块数组?
- react-native - 使用托管工作流的 Expo 推送通知不起作用
- asp.net - Chrome/Edge 浏览器中的 SSO 问题