首页 > 解决方案 > 为什么在elasticsearch的以下文档中查询“apache”不起作用?

问题描述

我有一个简单的文本文档,使用命令获取它时看起来像这样:curl -X GET "localhost:9200/customer/_doc/1"

{"_index":"customer","_type":"_doc","_id":"1","_version":1,"found":true,"_source":
{
  "description": "Sun Java Plug-In 1.4 through 1.4.2_02 allows remote attackers to repeatedly access the floppy drive via the createXmlDocument method in the org.apache.crimson.tree.XmlDocument class, which violates the Java security model."
}
}

当我使用下面提到的查询弹性搜索对上述文档进行查询时,没有给我任何匹配项,我想知道为什么?

{
    "query": {
        "match" : {
            "description": "apache"
        }
    }
}

如果我将 apache 替换为createXmlDocumentor ,则此查询成功org.apache.crimson.tree.XmlDocument。我最初的理解是 org.apache.crimson.tree.XmlDocument 将分为 5 个单词 org、apache、crimson、tree 和 XmlDocument 但此时我想可能整个 org.apache.crimson.tree.XmlDocument 被存储因为它是通过弹性搜索。如果是这样,为什么以及如何获得所需的结果?

标签: elasticsearch

解决方案


如果您不定义任何内容,将使用标准分析器

标准分析器将创建此令牌:

{
  "token" : "org.apache.crimson.tree.xmldocument",
  "start_offset" : 140,
  "end_offset" : 175,
  "type" : "<ALPHANUM>",
  "position" : 22
}

所以你的搜索没有找到任何东西。如果您使用模式分析器apache,将创建令牌。默认模式\W+(每个单词)都适合您。

你可以检查这个

curl -XGET "http://localhost:9200/_analyze" -H 'Content-Type: application/json' -d'
{
  "text": "Sun Java Plug-In 1.4 through 1.4.2_02 allows remote attackers to repeatedly access the floppy drive via the createXmlDocument method in the org.apache.crimson.tree.XmlDocument class, which violates the Java security model.",
  "analyzer": "pattern"
}'

为您的索引定义一个显式映射,如下所示:

PUT customer
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "_doc": {
      "properties": {
        "description": {
          "type": "text",
          "analyzer": "pattern"
        }
      }
    }
  }
}

如果再次运行查询,您将获得例如:

  "hits" : {
    "total" : 1,
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "customer",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "description" : "Sun Java Plug-In 1.4 through 1.4.2_02 allows remote attackers to repeatedly access the floppy drive via the createXmlDocument method in the org.apache.crimson.tree.XmlDocument class, which violates the Java security model."
        }
      }
    ]
  }

推荐阅读