performance - 如何加快这个多重匹配弹性搜索查询?
问题描述
我怎样才能加快这个弹性搜索查询?我发现在 multi_match.fields 中指定一小组字段会有所帮助。我还可以做些什么?这是查询......下面是索引映射。索引中有 800 万多条记录。
顺便说一句,我有时会在我的查询中包含聚合。我发现延迟加载聚合有助于提高聚合部分的性能。
{
"from": 0,
"size": 26,
"timeout": "60s",
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "a",
"fields": [
"firstname",
"lastname",
"home_address1",
"home_zip",
"home_city"
],
"type": "phrase_prefix",
"operator": "OR",
"slop": 0,
"prefix_length": 0,
"max_expansions": 50,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"fuzzy_transpositions": true,
"boost": 1
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"_source": {
"includes": [
"firstname",
"lastname",
"home_address1",
"home_city"
],
"excludes": []
},
"sort": [
{
"firstname.keyword": {
"order": "asc"
}
}
]
}
这是索引映射:
{
"contacts_3_w0iuvbowu5": {
"mappings": {
"properties": {
"contact_id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"created": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"date_of_birth": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"dist_congress": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"dist_precinct": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"dist_state_house": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"dist_state_senate": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"dist_ward_township": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"email": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"firstname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"eager_global_ordinals": true,
"ignore_above": 256
}
}
},
"fulltext": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"home_address1": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"eager_global_ordinals": true,
"ignore_above": 256
}
}
},
"home_address2": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"home_city": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"eager_global_ordinals": true,
"ignore_above": 256
}
}
},
"home_house_num": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"home_phone": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"home_postdirection": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"home_predirection": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"home_state": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"home_street_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"home_street_type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"home_zip": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"eager_global_ordinals": true,
"ignore_above": 256
}
}
},
"imported": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"lastname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"eager_global_ordinals": true,
"ignore_above": 256
}
}
},
"list_id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"middlename": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"registration_date": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"registration_status": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"sex": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"state_voter_id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"suffix": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
解决方案
以下是一些想法:
- 您没有在 _source 中包含 zip,但您仍然允许在此字段上进行搜索。这是您在业务案例中需要的东西吗?
- 您在搜索过程中应用排序。您可以在索引时间应用排序顺序。检查文档的这一部分,https://www.elastic.co/guide/en/elasticsearch/reference/master/index-modules-index-sorting.html。此外,设置
track_total_hits
为 false 可能会带来一些改进。 - 考虑添加一些前缀长度。您可以将其设置为 2。在大多数情况下,它不会真正影响用户体验。
- auto_generate_synonyms_phrase_query,你需要这个吗?https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html。如果不这样做,则将其设置为 false。
- 也可以考虑减少
max_expansions
。要么整体减少它,要么根据输入的长度(在应用程序级别)减少它。基本上,如果你有类似的东西a
,你可以把它增加到 20 或 30。不需要高达 50,结果(ux-wise)无论如何都不会真正有帮助。
推荐阅读
- python - 如何仅使用 pytest 报告一种异常类型的失败?
- numpy - 有效地将 0 分配给 2D numpy 数组中每一行的多列
- javascript - 如何在javascript中获取每个小时的最新心情
- sql - 查找最大岛大小 SQL
- sql - 如何从多个位置数据库查询库存余额
- javascript - 引导表条件显示/隐藏列
- c++ - opencv2 文件夹中缺少 xfeatures2d.hpp?
- python - 旧方式列表的交集:没有集合和没有 in 运算符
- javascript - “或”运算符后需要括号
- android - Listview Android 中的 Recyclerview(已关闭)