首页 > 解决方案 > 使用 Elasticsearch 处理器将新文档添加到单独的索引

问题描述

当我索引一些文档时,有没有办法填充单独的索引?

假设我有类似的东西:

PUT person/_doc/1
{
  "name": "Jonh Doe",
  "languages": ["english", "spanish"]
}

PUT person/_doc/2
{
  "name": "Jane Doe",
  "languages": ["english", "russian"]
}

我想要的是每次添加一个人时,都会将一种语言添加到语言索引中。

就像是:

GET languages/_search

会给:

...
"hits" : [
  {
    "_index" : "languages",
    "_type" : "doc",
    "_id" : "russian",
    "_score" : 1.0,
    "_source" : {
      "value" : "russian"
    }
  },
  {
    "_index" : "languages",
    "_type" : "doc",
    "_id" : "english",
    "_score" : 1.0,
    "_source" : {
      "value" : "english"
    }
  },
  {
    "_index" : "languages",
    "_type" : "doc",
    "_id" : "spanish",
    "_score" : 1.0,
    "_source" : {
      "value" : "spanish"
    }
  }
...

考虑管道,但我没有看到任何处理器允许这样的事情。

也许答案是创建一个自定义处理器。我已经有一个,但不确定如何在单独的索引中插入文档。


更新:使用@Val回答中描述的转换有效,并且似乎确实是正确的答案......

但是,我正在使用Open Distro for Elasticsearch并且转换在那里不可用。一些在那里工作的替代解决方案将不胜感激:)


更新 2:看起来OpenSearch正在取代Open Distro for Elasticsearch。并且有一个转换 api \o/

标签: elasticsearchelasticsearch-opendistroopensearch

解决方案


进入摄取管道的每个文档都不能像在 Logstash 中那样被克隆或拆分。因此,从单个文档中,您无法索引两个文档。

但是,在索引您的个人文档之后,绝对可以访问_transformAPI 端点并从该端点创建languages索引person

首先创建变换:

PUT _transform/languages-transform
{
  "source": {
    "index": "person"
  },
  "pivot": {
    "group_by": {
      "language": {
        "terms": {
          "field": "languages.keyword"
        }
      }
    },
    "aggregations": {
      "count": {
        "value_count": {
          "field": "languages.keyword"
        }
      }
    }
  },
  "dest": {
    "index": "languages",
    "pipeline": "set-id"
  }
}

您还需要创建为您的语言文档设置正确 ID 的管道:

PUT _ingest/pipeline/set-id
{
  "processors": [
    {
      "set": {
        "field": "_id",
        "value": "{{language}}"
      }
    }
  ]
}

然后,您可以开始转换:

POST _transform/languages-transform/_start

完成后,您将拥有一个名为的新索引languages,其内容为

GET languages/_search
=>
"hits" : [
  {
    "_index" : "languages",
    "_type" : "_doc",
    "_id" : "english",
    "_score" : 1.0,
    "_source" : {
      "count" : 4,
      "language" : "english"
    }
  },
  {
    "_index" : "languages",
    "_type" : "_doc",
    "_id" : "russian",
    "_score" : 1.0,
    "_source" : {
      "count" : 2,
      "language" : "russian"
    }
  },
  {
    "_index" : "languages",
    "_type" : "_doc",
    "_id" : "spanish",
    "_score" : 1.0,
    "_source" : {
      "count" : 2,
      "language" : "spanish"
    }
  }
]

请注意,您还可以按计划设置该转换,以便它定期运行,或者您可以在适合您的任何时候手动运行它,以重建语言索引。


OpenSearch 有自己的_transform API。它的工作方式略有不同,可以这样创建转换:

PUT _plugins/_transform/languages-transform
{
  "transform": {
    "enabled": true,
    "description": "Insert languages",
    "schedule": {
      "interval": {
        "period": 1,
        "unit": "minutes"
      }
    },
    "source_index": "person",
    "target_index": "languages",
    "data_selection_query": {
      "match_all": {}
    },
    "page_size": 1,
    "groups": [{
      "terms": {
        "source_field": "languages.keyword",
        "target_field": "value"
      }
    }]
  }
}

推荐阅读