java - Can messages be lost at kafka producer side during deployment?
问题描述
We are facing a peculiar issue and seeing that when we produce messages to Kafka, it is sometimes not being found at the consumer end. We tried to debug this further and enabled the onSuccess() and onFailure() callbacks. We got that major issue was -
org.springframework.kafka.core.KafkaProducerException: Failed to send; nested exception is org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.
To solve for this, we increased retries to 10 and it helped fix the issue almost completely.
However, we found 3 msgs(each at a different time) for which we neither had an onSuccess() or onFailure() callback. It just got lost in communication, so to say!
Now, this happened just before the application was taken down for redeployment. I understand from the Kafka Producer Config, the default batch size is 16KB and it waits for the batch to be filled before actually sending the message to the broker (I have deliberately taken out linger.ms consideration for simplicity).
My question is, can it happen that all the message in Kafka batch is getting lost when the system is forcefully being shutdown for deployment? If yes, how do we address this issue?
Please help me here as this is the issue that we are facing in production.
Many thanks in advance!
解决方案
If you are using batching and the server dies (kill -9, System.exit()
, or power failure) you can lose messages.
If you are using Spring Boot and perform an orderly shutdown (Ctrl-C) or otherwise close the Spring ApplicationContext (e.g. with a ShutDownHook), you should not lose anything because the producer will be closed during the context close, forcing a push of the partial batch.
If the pending sends can't be completed, you should see a log message:
log.info("Proceeding to force close the producer since pending requests could not be completed " +
"within timeout {} ms.", timeoutMs);
You can see the close() code in KafkaProducer
.
推荐阅读
- java - 频繁的“偏移超出范围”消息,消费者遗弃的分区
- java - 从 Google 电子表格获取数据到我的应用程序的代码突然停止工作
- python - 在 Python 中发送发布请求以登录网站所涉及的步骤
- selenium - 我们应该在 @BeforeClass 中使用 assert 还是在 TestNG 中使用 @BeforeTest 注释,这样做是一个好习惯吗?
- java - 如何为 Firebase 数据库异常添加内置类的无参数构造函数?
- java - 事件监听器的控制流
- python-3.x - 用另一个值更新熊猫数据框的正确方法
- java - login RESTFUL api,它在登录成功时返回一个 id
- python - Selenium 超时异常错误
- api-platform.com - 在 api-platform 中重命名自定义操作