首页 > 解决方案 > Elasticsearch 如何从动态映射中排除字段

问题描述

使用 Elasticsearch 版本 7

我的目的是用 3 个部分定义动态映射:

  1. 动态区域(所有字符串和数组都会动态添加)
  2. 添加所有其他已知字段的静态映射。
  3. 拒绝动态映射之上的所有已知字段

我成功处理了动态映射 + 静态区域(1,2 部分),但遗憾的是无法创建拒绝部分。

附件文件——

  {
      "_index": "index_test",
      "_type": "_doc",
      "_id": "8ed19f94-11e2-417c-998d-a21275d7fb18",
      "_score": 1.0,
      "_source": {
  / *********************************** known static mapping as free text *********** /
        "source": "S3 Large Bucket of Files",
        "scanner_type_group": "unstructured",
        "my_type": "file",
 / ************************************** dynamic fileds *********** /
        "dynamic_field_1": "S3 Large Bucket of Files",
        "dynamic_field_2": " foo 123",
        "fullyQualifiedName": "S3 Large Bucket of Files.hyper-scan-test",
        "columnOrFieldOccurrencesCounter": [
          {
            "fieldName": "fileContent"
          }
        ],
   /************************************* Ignore these fileds *********** / 
        "longNumber": 1622721252657,
        "floatNumber": 1.6227213E12,
        "discard1": " This static field should be ignored!!!",
        "discard2": " This static field should be ignored too!!!",
        "discard_array": [" aaa" ,"sss"]
      }
    }

到目前为止,我创建了映射/设置但没有拒绝部分:(

{
  "dynamic_templates": [
    {
      "strings_as_keywords": {
        "match_mapping_type": "string",
        "mapping": {
          "type": "text",
          "analyzer": "autocomplete"
        }
      }
    }
  ],
  "properties": {
    "source": { "type": "keyword" },
    "scanner_type_group": { "type": "keyword" },
    "my_type": { "type": "keyword" }
  },
   // How ignore the following property not indexed to ES  ???
   // 1. ignore all numeric types from dynamic mappings
   // 2. ignore other static mappings name: and array name discard_array
   // 3. choose  ignore either to store with/without index
   
}

文本的设置(有效)

{
  "analysis": {
    "filter": {
      "autocomplete_filter": {
        "type": "ngram",
        "min_gram": 2,
        "max_gram": 20
      }
    },
    "analyzer": {
      "autocomplete": {
        "type": "custom",
        "tokenizer": "standard",
        "filter": [
          "lowercase",
          "autocomplete_filter"
        ]
      }
    }
  },
  "max_ngram_diff": 30,
  "index": {
    "number_of_shards": 5,
    "number_of_replicas": 0
  }
}

任何建议如何忽略数字类型和特定字段 + 数组(如 json 中所述)。

谢谢

标签: elasticsearchelastic-stack

解决方案


推荐阅读