首页 > 解决方案 > 根据字段字符串中的单词返回文档数

问题描述

如何在“word_combination”中返回“words”列表中超过 2 个元素且超过 3 个单词的文档数。有没有办法计算字符串中的单词数?

示例:如果(“words”的长度 > 2)AND(“words.word_combination”超过 3 个单词)返回文档

我存储了许多文件。一个文档的结构如下所示:

"_source" : {
"group_words" : [

  {
    "amount" : 1140,
    "words" : [
      {
        "relevance_score" : 56,
        "points" : 66461,
        "bits" : 100,
        "word_combination" : "cat dog"
      },
      {
        "relevance_score" : 84,
        "points" : 45202,
        "bits" : 990,
        "word_combination" : "cat dog elephant"
      },
      {
        "relevance_score" : 99,
        "points" : 30974,
        "bits" : 70,
        "word_combination" : "elephant cat mouse leopard"
      }
    ],
    "group" : "whatever"
  },
  {
    "amount" : 1320,
    "words" : [
      {
        "relevance_score" : 25,
        "points" : 53396,
        "bits" : 70,
        "word_combination" : "lion elephant"
      },
      {
        "relevance_score" : 66,
        "points" : 52166,
        "bits" : 20,
        "word_combination" : "lion mouse fish cat dog"
      },
      {
        "relevance_score" : 82,
        "points" : 49316,
        "bits" : 810,
        "word_combination" : "elephant cat mouse leopard dog lion"
      },
      {
        "relevance_score" : 87,
        "points" : 127705,
        "bits" : 290,
        "word_combination" : "elephant cat mouse leopard tiger lion"
      }
    ],
    "group" : "whatever"
  },
  {
    "amount" : 11260,
    "words" : [
      {
        "relevance_score" : 0,
        "points" : 37909,
        "bits" : 9000,
        "word_combination" : "elephant cat mouse leopard tiger lion monkey"
      },
      {
        "relevance_score" : 3,
        "points" : 35782,
        "bits" : 540,
        "word_combination" : "elephant"
      }
    ],
    "group" : "whatever"
  }      
]

}

标签: elasticsearchkibana

解决方案


关于words数组中元素的数量,我的建议是words_count在索引时将该数字存储在一个附加字段中。

  {
    "amount" : 1140,
    "words_count": 3,                           <--- add this
    "words" : [
      {
        "relevance_score" : 56,
        "points" : 66461,
        "bits" : 100,
        "word_combination" : "cat dog"
      },
      {
        "relevance_score" : 84,
        "points" : 45202,
        "bits" : 990,
        "word_combination" : "cat dog elephant"
      },
      {
        "relevance_score" : 99,
        "points" : 30974,
        "bits" : 70,
        "word_combination" : "elephant cat mouse leopard"
      }
    ],
    "group" : "whatever"
  },

关于word_combination字段中单词(或标记)的数量,有一种称为数据类型的数据类型token_count正是为此目的而存在的。只需像这样定义您的映射:

...
"word_combination": {
  "type": "text",
  "fields": {
    "count": {
      "type": "token_count",
      "analyzer": "standard"
    }
  }
}

然后在您的查询中,您可以访问word_combination.count将包含word_combination字段中存在的令牌数量(由指定的分析器分析)。


推荐阅读