首页 > 解决方案 > Can messages be lost at kafka producer side during deployment?

问题描述

We are facing a peculiar issue and seeing that when we produce messages to Kafka, it is sometimes not being found at the consumer end. We tried to debug this further and enabled the onSuccess() and onFailure() callbacks. We got that major issue was -

org.springframework.kafka.core.KafkaProducerException: Failed to send; nested exception is org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.

To solve for this, we increased retries to 10 and it helped fix the issue almost completely.

However, we found 3 msgs(each at a different time) for which we neither had an onSuccess() or onFailure() callback. It just got lost in communication, so to say!

Now, this happened just before the application was taken down for redeployment. I understand from the Kafka Producer Config, the default batch size is 16KB and it waits for the batch to be filled before actually sending the message to the broker (I have deliberately taken out linger.ms consideration for simplicity).

My question is, can it happen that all the message in Kafka batch is getting lost when the system is forcefully being shutdown for deployment? If yes, how do we address this issue?

Please help me here as this is the issue that we are facing in production.

Many thanks in advance!

标签: javaapache-kafkaspring-kafkakafka-producer-api

解决方案


If you are using batching and the server dies (kill -9, System.exit(), or power failure) you can lose messages.

If you are using Spring Boot and perform an orderly shutdown (Ctrl-C) or otherwise close the Spring ApplicationContext (e.g. with a ShutDownHook), you should not lose anything because the producer will be closed during the context close, forcing a push of the partial batch.

If the pending sends can't be completed, you should see a log message:

log.info("Proceeding to force close the producer since pending requests could not be completed " +
                "within timeout {} ms.", timeoutMs);

You can see the close() code in KafkaProducer.


推荐阅读