首页 > 解决方案 > 在 kafka-connect 中使用 jdbc 源连接器时面临数据丢失问题

问题描述

在过去的 5 个月里,我们使用 Apache kafka 和 JDBC 源连接器从 Redshift 数据库中提取数据。

但是现在我们面临着一些严重的数据丢失问题,比如在 25M 条记录中我们只得到 9M 条记录。

在检查连接日志时,我们观察到一些事情,比如数据被刷新

下面是我们在 connect.log 中得到的日志

                [2021-04-28 15:34:39,485] INFO WorkerSourceTask{id=SAMPLE-CONNECTOR-0} flushing 915 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:495)
                Line 24956: [2021-04-28 15:34:39,535] INFO WorkerSourceTask{id=SAMPLE-CONNECTOR-0} Finished commitOffsets successfully in 50 ms (org.apache.kafka.connect.runtime.WorkerSourceTask:574)
                Line 24983: [2021-04-28 15:35:29,535] INFO WorkerSourceTask{id=SAMPLE-CONNECTOR-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:478)
                Line 24984: [2021-04-28 15:35:29,536] INFO WorkerSourceTask{id=SAMPLE-CONNECTOR-0} flushing 1160 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:495)
                Line 24985: [2021-04-28 15:35:29,613] INFO WorkerSourceTask{id=SAMPLE-CONNECTOR-0} Finished commitOffsets successfully in 77 ms (org.apache.kafka.connect.runtime.WorkerSourceTask:574)
                Line 25012: [2021-04-28 15:36:19,613] INFO WorkerSourceTask{id=SAMPLE-CONNECTOR-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:478)
                Line 25013: [2021-04-28 15:36:19,613] INFO WorkerSourceTask{id=SAMPLE-CONNECTOR-0} flushing 1280 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:495)
                Line 25014: [2021-04-28 15:36:19,679] INFO WorkerSourceTask{id=SAMPLE-CONNECTOR-0} Finished commitOffsets successfully in 66 ms (org.apache.kafka.connect.runtime.WorkerSourceTask:574)
                Line 25041: [2021-04-28 15:37:09,680] INFO WorkerSourceTask{id=SAMPLE-CONNECTOR-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:478)
                Line 25042: [2021-04-28 15:37:09,680] INFO WorkerSourceTask{id=SAMPLE-CONNECTOR-0} flushing 1628 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:495)
                Line 25043: [2021-04-28 15:37:09,781] INFO WorkerSourceTask{id=SAMPLE-CONNECTOR-0} Finished commitOffsets successfully in 101 ms (org.apache.kafka.connect.runtime.WorkerSourceTask:574)
                Line 25070: [2021-04-28 15:37:59,782] INFO WorkerSourceTask{id=SAMPLE-CONNECTOR-0} Committing offsets (org.apache.kafka.connect.runtime.WorkerSourceTask:478)
                Line 25071: [2021-04-28 15:37:59,782] INFO WorkerSourceTask{id=SAMPLE-CONNECTOR-0} flushing 29 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask:495)
                Line 25072: [2021-04-28 15:37:59,

在这个日志中,一些数据不断被定期刷新,所以我们认为这可能是问题所在?

以前有人遇到过同样的问题吗?

任何人都知道如何克服这个问题或这个问题的根本原因是什么?

标签: apache-kafkaamazon-redshiftapache-kafka-connect

解决方案


推荐阅读