首页 > 解决方案 > geo_point 映射 python 和 StreamSets 因 Elasticsearch 失败

问题描述

我在弹性搜索中有这个映射

"mappings": {
          "properties": {
                "fromCoordinates": {"type": "geo_point"},
                "toCoordinates": {"type": "geo_point"},
                "seenCoordinates": {"type": "geo_point"},
            }
        }

使用 kibana 的控制台, elasticsearch支持的 geo_ip 字段的所有可能组合都没有问题,即:

(纬度,经度)

PUT /anindex/_doc/1
{
   "fromCoordinates": {
     "lat": 36.857200622558594    
     "lon": 117.21600341796875,

  },
  "toCoordinates": {
    "lat": 22.639299392700195    
    "lon": 113.81099700927734,

  },
  "seenCoordinates": {
     "lat": 36.91663    
     "lon": 117.216,
   }
}

(经度,纬度)

PUT /anindex/_doc/2
{
 "fromCoordinates": [36.857200622558594, 117.21600341796875], 
 "toCoordinates": [22.639299392700195, 113.81099700927734], 
 "seenCoordinates": [36.91663, 117.216] 
}

但是我尝试通过python将数据插入到elasticsearch中,但我总是遇到这个错误:

RequestError(400, 'illegal_argument_exception', 'mapper [fromCoordinates] of different type, current_type [geo_point], merged_type [ObjectMapper]')

在 python 中,我从字典中构造了 json,这是我打印时的结果:

fromCoordinates = {}
fromCoordinates['lat'] = fromLat  
fromCoordinates['lon'] = fromLon 

dataDictionary.update({'fromCoordinates': fromCoordinates , 'toCoordinates': toCoordinates, 'seenCoordinates': seenCoordinates})
print(json.dumps(dataDictionary).encode('utf-8'))
{"fromCoordinates": {"lat": 43.9962005615, "lon": 125.684997559}, 
"toCoordinates": {"lat": 40.080101013183594, "lon": 116.58499908447266}, 
"seenCoordinates": {"lat": 33.62672, "lon": 109.37243}}

并加载这个

data = json.dumps(dataDictionary).encode('utf-8')
es.create(index='anindex', doc_type='document', id=0, body=data)

数组版本也有同样的问题:

fromCoordinates = [fromLon, fromLat]

这是在 python 中创建和打印的 json:

{"fromCoordinates": [113.81099700927734, 22.639299392700195], 
  "toCoordinates": [106.8010025024414, 26.53849983215332], 
   "seenCoordinates": [107.46743, 26.34169]}

在这种情况下,我有这个回应

RequestError: RequestError(400, 'mapper_parsing_exception', 'geo_point expected')

如果我尝试将 StreamSets 用于 elasticsearch,则会发生相同的错误,这两种类型的 json 都显示在前面:

mapper [fromCoordinates] of different type, current_type [geo_point], merged_type [ObjectMapper]

有任何想法吗?

更新:

GET /anindex/_mapping
{ "anindex" : 
   { "mappings" : 
     { "properties" : 
       { "fromCoordinates" : 
          { "type" : "geo_point" }, 
        "toCoordinates" : 
           { "type" : "geo_point" }, 
        "seenCoordinates" : { "type" : "geo_point" } 
       }
      }
    }
 }

解决方案:

在@jzzfs 给出的示例之后,我意识到 , 中的 doc_type 参数es.create(index='anindex', doc_type='document', id=0, body=data)导致了错误,我删除了它,并且它起作用了.....但我仍然想知道为什么 StreamSets 有同样的错误......但我会继续使用 python。

标签: pythonjsonelasticsearchstreamsets

解决方案


我怀疑您首先打开了object映射fromCoordinates,然后尝试更新映射。尝试删除并重新创建索引,然后所有这些变体都应该可以正常工作:


Python

from elasticsearch import Elasticsearch
import time

es_instance = Elasticsearch(['http://localhost:9200'])

es_instance.indices.create(
    'anindex',
    body={"mappings": {
        "properties": {
            "fromCoordinates": {"type": "geo_point"},
            "toCoordinates": {"type": "geo_point"},
            "seenCoordinates": {"type": "geo_point"}
        }
    }})

es_instance.create(
    index="anindex",
    id=0,
    body={
        "fromCoordinates": {"lat": 43.9962005615, "lon": 125.684997559},
        "toCoordinates": {"lat": 40.080101013183594, "lon": 116.58499908447266},
        "seenCoordinates": {"lat": 33.62672, "lon": 109.37243}})

es_instance.create(
    index="anindex",
    id=1,
    body={
        "fromCoordinates": [
            117.21600341796875,
            36.857200622558594
        ],
        "toCoordinates": [
            113.81099700927734,
            22.639299392700195
        ],
        "seenCoordinates": [
            117.216,
            36.91663
        ]
    })

# syncing is not instant so wait
time.sleep(1)

print(es_instance.count(index="anindex"))


基巴纳:

DELETE anindex

PUT anindex
{
  "mappings": {
    "properties": {
      "fromCoordinates": {
        "type": "geo_point"
      },
      "toCoordinates": {
        "type": "geo_point"
      },
      "seenCoordinates": {
        "type": "geo_point"
      }
    }
  }
}

PUT /anindex/_doc/1
{
  "fromCoordinates": {
    "lat": 36.857200622558594,
    "lon": 117.21600341796875
  },
  "toCoordinates": {
    "lat": 22.639299392700195,
    "lon": 113.81099700927734
  },
  "seenCoordinates": {
    "lat": 36.91663,
    "lon": 117.216
  }
}

PUT /anindex/_doc/2
{
  "fromCoordinates": [
    117.21600341796875,
    36.857200622558594
  ],
  "toCoordinates": [
    113.81099700927734,
    22.639299392700195
  ],
  "seenCoordinates": [
    117.216,
    36.91663
  ]
}

PUT anindex/_doc/3
{
  "fromCoordinates": "22.639299392700195,113.81099700927734",
  "toCoordinates": "26.53849983215332,106.8010025024414",
  "seenCoordinates": "26.34169,107.46743"
}

推荐阅读