mongodb - 无法将数据从 kafka 主题发送到 elasticsearch
问题描述
我正在尝试使用 mongo 从 mongoDB 到我的 kafka 主题 db(作为源)、elasticsearch(作为接收器)和 kafka 构建数据管道。我已成功从 mongoDB 接收到我的 kafka 主题的数据。这是从 mongoDB 捕获的数据示例
{"_id": {"_data": "825E88FED8000000012B022C0100296E5A10044D2CA180FAF94580B30CFA4B3CC80E1546645F696400645E88FED793AFA61A58411B2A0004"}, "operationType": "insert", "clusterTime": {"$timestamp": {"t": 1586036440, "i": 1}}, "fullDocument": {"_id": {"$oid": "5e88fed793afa61a58411b2a"}, "name": "Lefèvre Mathis", "phoneNumber": 87640262, "phoneNumber2": 98462768, "phoneNumber3": 50591075, "email": "LefèvreMathis@gmail.com", "websiteUrl": "www.LefèvreMathis.fr", "legalInformation": {"companyName": "Duval EI", "siren": 7.3887975858196E13, "nic": 28866, "siret": 7.3887975858196E13, "ape": "49.53", "tva": "FR-1173030343", "description": "Blanditiis et placeat voluptas hic et. Quae et autem inventore ut enim fugit. Nihil velit in ut magnam."}, "professionType": {"type": "Hotel", "category": "professionnel"}, "operator": {"name": "Orange"}, "address": [{"city": "Paris", "street": "Quartier Les Halles, Paris 1er Arrondissement, Paris, Île-de-France, France métropolitaine, 75001, France", "zipCode": 75001, "latitude": "48.86330665", "longitude": "2.348370623761905"}], "openingTimeSet": [{"day": "Lundi", "opening": "08:00", "closing": "18:00"}, {"day": "Mardi", "opening": "08:00", "closing": "18:00"}, {"day": "Mercredi", "opening": "08:00", "closing": "18:00"}, {"day": "Jeudi", "opening": "08:00", "closing": "18:00"}, {"day": "Vendredi", "opening": "08:00", "closing": "18:00"}, {"day": "Samedi", "opening": "08:00", "closing": "18:00"}, {"day": "Dimanche", "opening": "08:00", "closing": "18:00"}], "_class": "com.sofrecom.elasticsearch.model.Subscriber"}, "ns": {"db": "elasticsearchApp", "coll": "subscriber"}, "documentKey": {"_id": {"$oid": "5e88fed793afa61a58411b2a"}}}
问题是当我运行我的 ES sink 连接器时,我得到了这个异常:
Caused by: org.apache.kafka.connect.errors.DataException: Converting byte[] to Kafka Connect data failed due to serialization error:
at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:355)
at org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:86)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:485)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
... 13 more
Caused by: org.apache.kafka.common.errors.SerializationException: java.io.CharConversionException: Invalid UTF-32 character 0x658b027b (above 0x0010ffff) at char #1, byte #7)
这是我的 kafka-connect 配置:
CONNECT_BOOTSTRAP_SERVERS: kafka:9092
CONNECT_REST_ADVERTISED_HOST_NAME: connect
CONNECT_REST_PORT: 8083
CONNECT_GROUP_ID: compose-connect-group
CONNECT_CONFIG_STORAGE_TOPIC: docker-connect-configs
CONNECT_OFFSET_STORAGE_TOPIC: docker-connect-offsets
CONNECT_STATUS_STORAGE_TOPIC: docker-connect-status
CONNECT_KEY_CONVERTER: org.apache.kafka.connect.json.JsonConverter
CONNECT_VALUE_CONVERTER: org.apache.kafka.connect.json.JsonConverter
CONNECT_INTERNAL_KEY_CONVERTER: org.apache.kafka.connect.json.JsonConverter
CONNECT_INTERNAL_VALUE_CONVERTER: org.apache.kafka.connect.json.JsonConverter
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1
CONNECT_PLUGIN_PATH: '/usr/share/java,/etc/kafka-connect/jars'
CONNECT_CONFLUENT_TOPIC_REPLICATION_FACTOR: 1
我的 es-sink 连接器:
{ "name": "sink", "config": { "connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector", "connection.url": "http://172.21.0.4:9200", "type.name": "subscriber", "topics": "test5.elasticsearchApp.subscriber", "key.ignore": "false","value.converter.schemas.enable": "false","schema.ignore": "true","value.converter":"org.apache.kafka.connect.json.JsonConverter" } }
和 mongodb-source-connector
{ "name": "mongo-source", "config": { "connector.class": "com.mongodb.kafka.connect.MongoSourceConnector","tasks.max":1,"connection.uri":"mongodb://mongo1:27017,mongo2:27017","database":"elasticsearchApp","collection":"subscriber", "topic.prefix":"test15","value.converter":"org.apache.kafka.connect.storage.StringConverter"} }
当我尝试在我的 mongoDBConnector 中使用 json 转换器时,我在从 kafka 主题消费时得到了我的有效负载的字符串格式
{"schema":{"type":"string","optional":false},"payload":"{\"_id\": {\"_data\": \"825E89EA94000000012B022C0100296E5A10044D2CA180FAF94580B30CFA4B3CC80E1546645F696400645E89EA94FC56002500157F490004\"}, \"operationType\": \"insert\", \"clusterTime\": {\"$timestamp\": {\"t\": 1586096788, \"i\": 1}}, \"fullDocument\": {\"_id\": {\"$oid\": \"5e89ea94fc56002500157f49\"}, \"name\": \"Lefèvre Mathis\", \"phoneNumber\": 87640262, \"phoneNumber2\": 98462768, \"phoneNumber3\": 50591075, \"email\": \"LefèvreMathis@gmail.com\", \"websiteUrl\": \"www.LefèvreMathis.fr\", \"legalInformation\": {\"companyName\": \"Duval EI\", \"siren\": 7.3887975858196E13, \"nic\": 28866, \"siret\": 7.3887975858196E13, \"ape\": \"49.53\", \"tva\": \"FR-1173030343\", \"description\": \"Blanditiis et placeat voluptas hic et. Quae et autem inventore ut enim fugit. Nihil velit in ut magnam.\"}, \"professionType\": {\"type\": \"Hotel\", \"category\": \"professionnel\"}, \"operator\": {\"name\": \"Orange\"}, \"address\": [{\"city\": \"Paris\", \"street\": \"Quartier Les Halles, Paris 1er Arrondissement, Paris, Île-de-France, France métropolitaine, 75001, France\", \"zipCode\": 75001, \"latitude\": \"48.86330665\", \"longitude\": \"2.348370623761905\"}], \"openingTimeSet\": [{\"day\": \"Lundi\", \"opening\": \"08:00\", \"closing\": \"18:00\"}, {\"day\": \"Mardi\", \"opening\": \"08:00\", \"closing\": \"18:00\"}, {\"day\": \"Mercredi\", \"opening\": \"08:00\", \"closing\": \"18:00\"}, {\"day\": \"Jeudi\", \"opening\": \"08:00\", \"closing\": \"18:00\"}, {\"day\": \"Vendredi\", \"opening\": \"08:00\", \"closing\": \"18:00\"}, {\"day\": \"Samedi\", \"opening\": \"08:00\", \"closing\": \"18:00\"}, {\"day\": \"Dimanche\", \"opening\": \"08:00\", \"closing\": \"18:00\"}], \"_class\": \"com.sofrecom.elasticsearch.model.Subscriber\"}, \"ns\": {\"db\": \"elasticsearchApp\", \"coll\": \"subscriber\"}, \"documentKey\": {\"_id\": {\"$oid\": \"5e89ea94fc56002500157f49\"}}}"}
解决方案
如果您不希望 Mongo 连接器生成字符串有效负载,请不要使用它
"value.converter":"org.apache.kafka.connect.storage.StringConverter"
schema
您将需要在接收器中使用它,因为您payload
在主题的 JSON 中都有"value.converter.schemas.enable": "true"
您需要使用 Elasticsearch 索引映射来解析字符串,因为 Connect 不会为您执行此操作。
我不确定 Mongo 连接器中是否存在错误。从未使用过它,但我想 JSON Comverter 应该可以工作,或者至少是 Avro。
推荐阅读
- react-native - 如何使用 React Native Web 录制语音?
- javascript - Vue.js 变量在基础模式中失去作用域
- android - android studio 多个dex文件定义 Lcom/google/common/reflect/Types$WildcardTypeImpl;
- python-3.x - 如何按大小对列表进行排序?
- c++ - 为什么 Itanium ABI 需要在内存中分配一些值参数并通过引用传递?
- mysql - 在 Asp.Net Core 中使用 Pomelo.EntityFrameworkCore.MySql 获取模型列表 Json
- azure-service-fabric - 如何查找 Service Fabric Mesh 服务的公共 IP 地址
- c++ - 如何在 QTableView 中显示 SQLite 表
- ios - 为什么在观察块内的打印(计数)之前执行观察块外的打印?
- maven - Maven 故障安全插件 runOrder