java - 使用 Elasticssearch 7 java api 使用 Where 语句进行部分查询
问题描述
我正在使用以下内容进行搜索。它工作正常。但是当找到完整的单词匹配时,它会返回结果。但我想要部分查询的结果(至少 3 个字符匹配不完整的单词)。另一个检查应该是,我的campus
文档中有一个字段。哪个具有campus: "Bradford"
,等值。我希望我的查询应该返回campus:"Oxford"
应该并且将在整个文档的其余部分中可用的文档。campus:"Harvard"
campus
Bradford or Oxford
Nel
RestHighLevelClient client;
QueryBuilder matchQueryBuilder = QueryBuilders.queryStringQuery("Nel");
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(matchQueryBuilder);
SearchRequest searchRequest = new SearchRequest("index_name");
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
如果我们用 SQL 语句进行映射,就像我们使用的where campus='Bradford' OR campus='Oxford'
.
在文件中,我有“纳尔逊·曼德拉二世”
目前,如果我写Nelson
为 query ,它可以工作,但我需要它与 query 一起工作Nel
。
解决方案
基本上有两种可能的方法来实现您正在寻找的用例。
解决方案一:使用通配符查询
假设您有两个字段
name
类型text
campus
类型text
以下是您的 java 代码的样子:
private static void wildcardQuery(RestHighLevelClient client, SearchSourceBuilder sourceBuilder)
throws IOException {
System.out.println("-----------------------------------------------------");
System.out.println("Wildcard Query");
MatchQueryBuilder campusClause_1 = QueryBuilders.matchQuery("campus", "oxford");
MatchQueryBuilder campusClause_2 = QueryBuilders.matchQuery("campus", "bradford");
//Using wildcard query
WildcardQueryBuilder nameClause = QueryBuilders.wildcardQuery("name", "nel*");
//Main Query
BoolQueryBuilder query = QueryBuilders.boolQuery()
.must(nameClause)
.should(campusClause_1)
.should(campusClause_2)
.minimumShouldMatch(1);
sourceBuilder.query(query);
SearchRequest searchRequest = new SearchRequest();
//specify your index name in the below parameter
searchRequest.indices("my_wildcard_index");
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
System.out.println(searchResponse.getHits().getTotalHits());
System.out.println("-----------------------------------------------------");
}
请注意,如果上面的字段是keyword
类型并且您需要完全匹配以区分大小写,则需要以下代码:
TermQueryBuilder campusClause_2 = QueryBuilders.termQuery("campus", "Bradford");
解决方案 2. 使用 Edge Ngram 分词器(首选解决方案)
为此,您需要使用Edge Ngram标记器。
以下是您的映射方式:
映射:
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"filter": "lowercase",
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"properties": {
"name":{
"type": "text",
"analyzer": "my_analyzer"
},
"campus": {
"type": "text"
}
}
}
}
样本文件:
PUT my_index/_doc/1
{
"name": "Nelson Mandela",
"campus": "Bradford"
}
PUT my_index/_doc/2
{
"name": "Nel Chaz",
"campus": "Oxford"
}
查询 DSL
POST my_index/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "nel"
}
}
],
"should": [
{
"match": {
"campus": "bradford"
}
},
{
"match": {
"campus": "oxford"
}
}
],
"minimum_should_match": 1
}
}
}
Java 代码:
private static void boolMatchQuery(RestHighLevelClient client, SearchSourceBuilder sourceBuilder)
throws IOException {
System.out.println("-----------------------------------------------------");
System.out.println("Bool Query");
MatchQueryBuilder campusClause_1 = QueryBuilders.matchQuery("campus", "oxford");
MatchQueryBuilder campusClause_2 = QueryBuilders.matchQuery("campus", "bradford");
//Plain old match query would suffice here
MatchQueryBuilder nameClause = QueryBuilders.matchQuery("name", "nel");
BoolQueryBuilder query = QueryBuilders.boolQuery()
.must(nameClause)
.should(campusClause_1)
.should(campusClause_2)
.minimumShouldMatch(1);
sourceBuilder.query(query);
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("my_index");
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
System.out.println(searchResponse.getHits().getTotalHits());
}
请注意我刚刚如何对名称字段使用匹配查询。我建议您阅读一些有关analysis、analyzer、tokenizer和edge-ngram tokenizer的内容。
在控制台中,您应该能够看到文档的总点击量。
Term query
同样,如果您正在寻找keyword
字段等的完全匹配,您也可以使用其他查询类型,例如在上述解决方案中。
更新答案:
我个人不推荐Solution 1
,因为单个字段本身会浪费大量计算能力,更不用说多个字段了。
为了进行多字段子字符串匹配,最好的方法是使用称为 as 的概念copy-to
,然后对该字段使用 Edge N-Gram 标记器。
那么这个 Edge N-Gram 分词器到底做了什么?简单地说,min-gram
它max-gram
会简单地分解你的令牌,例如
ZeppelinZep, Zepp, Zeppe, Zeppel, Zeppeli, Zeppelin
将这些值插入到该字段的倒排索引中。如果您只是执行一个非常简单的match
查询,它不会返回该文档,因为您的倒排索引将具有该子字符串。
关于copy_to字段:
该
copy_to
参数允许您将多个字段的值复制到一个组字段中,然后可以将其作为单个字段进行查询。
使用 copy_to 字段,我们对两个字段campus
和name
.
映射:
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"filter": "lowercase",
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"properties": {
"name":{
"type": "text",
"copy_to": "search_string" <---- Note this
},
"campus": {
"type": "text",
"copy_to": "search_string" <---- Note this
},
"search_string": {
"type": "text",
"analyzer": "my_analyzer" <---- Note this
}
}
}
}
请注意,在上面的映射中,我如何仅将 Edge N-gram 特定分析器用于search_string
. 请注意,这会消耗磁盘空间,因此您可能需要退后一步,并确保您不对所有字段都使用此分析器,但这又取决于您拥有的用例。
样本文件:
POST my_index/_doc/1
{
"campus": "Cambridge University",
"name": "Ramanujan"
}
搜索查询:
POST my_index/_search
{
"query": {
"match": {
"search_string": "ram"
}
}
}
这将为您提供如下简单的 Java 代码:
private static void boolMatchQuery(RestHighLevelClient client, SearchSourceBuilder sourceBuilder)
throws IOException {
System.out.println("-----------------------------------------------------");
System.out.println("Bool Query");
MatchQueryBuilder searchClause = QueryBuilders.matchQuery("search_string", "ram");
//Feel free to add multiple clauses
BoolQueryBuilder query = QueryBuilders.boolQuery()
.must(searchClause);
sourceBuilder.query(query);
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("my_index");
searchRequest.source(sourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
System.out.println(searchResponse.getHits().getTotalHits());
}
希望有帮助!
推荐阅读
- deep-learning - CNN,神经网络角度检测
- azure-functions - import namespace in Azure function when creating csx in portal
- sql - Find missing contact date details
- node.js - 如何使用 local.example.js ENV 运行测试
- r - 有没有办法输入 dplyr::summarise 变量?
- amazon-web-services - 如何提高 lambda 性能?
- python - What should I do if an else statement is not executing after if/elif statements didn't meet the condition?
- kubernetes - How to properly egress a services to a specific static IP address in Azure Kubernetes Service?
- java - Display multiple map fragments in an activity simultaneously
- c - CHAR_WIDTH 未声明